A cross-language state mapping approach to bilingual (Mandarin-English) TTS

Hui Liang Hui Liang,Frank K Soong,Gongshen Liu Gongshen Liu,Yao Qian Yao Qian

doi:10.1109/icassp.2008.4518691

Hui Liang Hui Liang, Frank K Soong + Show 2 more

https://doi.org/10.1109/icassp.2008.4518691

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

We propose a cross-language state mapping approach to HMM-based bilingual TTS. Two language-dependent decision trees are built first with a bilingual speech database recorded by a single speaker. A state mapping for every leaf node in the decision tree of a target language is created by finding the nearest leaf node in the tree of a source language. Kullback-Leibler divergence between two distributions is used to find the nearest leaf node. To synthesize target language speech by a monolingual, (source language) speaker's voice, we find HMM parameters trained by the monolingual (source language) speaker in the mapped leaf nodes. Similar mappings can be constructed by reversing the source and target languages. With these bi-directional cross-lingual mappings, we can synthesize bilingual or mixed-code speech by HMMs trained by any monolingual speaker. High voice (speaker) similarity is preserved in synthesized speech of the target language. Two perceptual tests on synthesized Mandarin speech confirms high intelligibility with a Chinese character transcription accuracy of 92.1% and an MOS score of 3.08.

Full Text