This repository contains the implementation of (MQGAN) for audio synthesis. The project is structured to facilitate the entire workflow from data preparation to model deployment.
Abstract: The aim of this paper is to investigate possible workflows for OOD pattern recognition in AI-based spectrogram analysis, applied in industrial manufacturing environment. First, we attempt to ...
Abstract: With the rapid advancement of synthetic speech technology, the challenges posed by audio deepfakes have become increasingly severe. Despite notable progress in synthetic speech detection, ...
Diffusion Speech is a diffusion-based text-to-speech model. Our speech synthesis pipeline is quite simple. We use a diffusion transformer model (DiT) to predict the duration of each phoneme. Then we ...