A Method for Emotional Speech Synthesis Based on Speaker Adaptive Training

Xin Lu ,Hongwu Yang,Yanqin Li

doi:10.1109/iscslp.2018.8706693

Abstract

Emotional speech synthesis is expected to make the synthesized speech more expressive. In order to synthesize more natural emotional speech signals, this paper presents a method to realize HMM based emotional speech synthesis using a Mandarin speech synthesis framework. A Mandarin context-dependent label format is adopted to label emotional sentences. A Mandarin question set is also extended for emotional sentences by adding language-specific questions. A Mandarin speech synthesis framework is utilized to train an average voice model from a large Mandarin multi speaker-based corpus and a small emotional one-speaker-based corpus using the Speaker Adaptive Training. Then the speaker adaptation transformation is applied to the average voice model to obtain a speaker-adapted emotional model. Experimental results show that in case of the same emotional corpus, this method proposed outperforms the method using the speaker dependent emotional model when the number of training Mandarin utterances is increased.

Full Text