Abstract

Voice Conversion (VC) is a task of converting speaker-dependent features of a source speaker's speech without changing the linguistic content. There are many successful VC systems, each trying to overcome some challenges. These challenges include the unavailability of parallel data and solving problems due to the language difference between the source and target speech. Also, one of these challenges is extending the VC system to cover a conversion across many source and target domains with minimal cost. Generative Adversarial Networks (GANs) are showing promising VC results. This work focuses on exploring many-to-many non-parallel GAN-based mono-lingual VC models (nine models that are highly cited), explains the used evaluation methods including objective and subjective methods (eight evaluation methods are presented), and comments on these models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call