Abstract

For any given mixed-language text, a multilingual synthesizer synthesizes speech that is intelligible to human listener. However, as speech data are usually collected from native speakers to avoid foreign accent, synthesized speech shows speaker switching at language switching points. To overcome this, the multilingual speech corpus can be converted to a polyglot speech corpus using cross-lingual voice conversion, and a polyglot synthesizer can be developed. Cross-lingual voice conversion is a technique to produce utterances in target speaker's voice from source speaker's utterance irrespective of the language and text spoken by the source and the target speakers. Conventional voice conversion technique based on GMM tokenization suffer from degradation in speech quality as the spectrum is oversmoothed due to statistical averaging. The current work focuses on alleviating the oversmoothing effect in GMM-based voice conversion technique, using (source) language-specific mixture weights in a multi-level GMM followed by selective pole focusing in the unvoiced speech segments. The continuity between the frames of the converted speech is ensured by performing fifth-order mean filtering in the cepstral domain. For the current work, cross-lingual voice conversion is performed for four regional Indian languages and a foreign language namely, Tamil, Telugu, Malayalam, Hindi, and Indian English. The performance of the system is evaluated subjectively using ABX listening test for speaker identity and using mean opinion score for quality. Experimental results demonstrate that the proposed method effectively improves the quality and intelligibility mitigating the oversmoothing effect in the voice-converted speech. A hidden Markov model-based polyglot text-to-speech system is also developed, using this converted speech corpus, to further make the system suitable for unrestricted vocabulary.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call