Autotuned Voice Cloning Enabling Multilingualism

Prof Sakshi Shejole,Piyush Jaiswal,Samnan Shaikh,Neha Karmal,Vivek Patil

doi:10.22214/ijraset.2023.52906

Abstract

Abstract: This article describes a neural network-based text-to-speech (TTS) synthesis system that can generate spoken audio in a variety of speaker voices. We show that the proposed model can convert natural-language text-to-speech into a target language, and synthesize and translate natural text-to-speech. We quantify the importance of trained voice modules to obtain the best generalization performance. Finally, using randomly selected speaker embeddings, we show that speech can be synthesized with new speaker voices used in training and that the model learned high-quality speaker representations. We have also introduced a multilingual system and auto-tuner that allows you to translate regular text into another language, which makes multilingualization possible for various applications.

Full Text