Speaker Adaptation of a Multilingual Acoustic Model for Cross-Language Synthesis

Ivan Himawan,Simon King,Iris Ouyang,Sam Kang,Sandesh Aryal,Pierre Lanchantin

doi:10.1109/icassp40776.2020.9053642

Abstract

Several studies have shown promising results in adapting DNN- based acoustic models as a mechanism to transfer characteristics from pre-trained models. One such example is speaker adaptation using a small amount of data, where fine-tuning has helped train models that extrapolate well to diverse linguistic contexts that are not present in the adaptation data. In the current work, our objective is to synthesize speech in different languages using the target speaker’s voice, regardless of the language of their data. To achieve this goal, we create a multilingual model using a corpus that consists of recordings from a large number of monolingual and a few bilingual speakers in multiple languages. The model is then adapted using the target speaker’s recordings in a language other than the target language. We also explore if additional adaptation data from a native speaker of the target language improves the performance. The subjective evaluation shows that the proposed approach of cross-language speaker adaptation is able to synthesize speech in the target language, in the target speaker's voice, without data spoken by the target speaker in that language. Also, extra data from a native speaker of the target language can improve model performance.

Full Text