Realizing Tibetan speech synthesis by speaker adaptive training

Hong-Wu Yang,Zhen-Ye Gan,Keiichi Tokuda,Keiichiro Oura

doi:10.1109/apsipa.2013.6694379

Abstract

This paper presents a method to realize HMM-based Tibetan speech synthesis using a Mandarin speech synthesis framework. A Mandarin context-dependent label format is adopted to label Tibetan sentences. A Mandarin question set is also extended for Tibetan by adding language-specific questions. A Mandarin speech synthesis framework is utilized to train an average mixed-lingual model from a large Mandarin multi-speaker-based corpus and a small Tibetan one-speaker-based corpus using the speaker adaptive training. Then the speaker adaptation transformation is applied to the average mixed-lingual model to obtain a speaker adapted Tibetan model. Experimental results show that this method outperforms the method using speaker dependent Tibetan model when only a small amount of training Tibetan utterances are available. When the number of training Tibetan utterances is increased, the performances of the two methods tend to be the same.

Full Text