Emotional speech synthesis based on DNN and PAD emotional state model

Weizhao Zhang,Hongwu Yang,Pengpeng Zhi

doi:10.1109/iscslp.2018.8706656

Abstract

An emotional speech synthesis method based on deep neural network (DNN) and Pleasure-Arousal-Dominance (PAD) emotional state model is proposed. Firstly, the PAD model is used to annotate the emotional state of multi-speaker and multi-emotion speech corpus by calculating the distance between the marked emotional point of speech corpus and the typical emotional point. Secondly, the DNN-based emotional speech synthesis method is used to generate acoustic features. Thirdly, the prosodic features of synthesized emotion speech are modified by using PAD model. Finally, the target emotional speech is synthesized by vocoder. The subjective evaluation results show that comparing with hidden Markov model (HMM)-based and DNN-based method, the proposed method can achieve better performance. Objective tests also demonstrate that the spectrum of the emotional speech synthesized by the proposed method is much closer to the original speech.

Full Text