Deep Learning Based Emotion Recognition from Chinese Speech

Weishan Zhang,Dehai Zhao,Yuanjie Zhang,Xiufeng Chen

doi:10.1007/978-3-319-39601-9_5

Abstract

Emotion Recognition is challenging for understanding people and enhance human computer interaction experiences. In this paper, we explore deep belief networks DBN to classify six emotion status: anger, fear, joy, neutral status, sadness and surprise using different features fusion. Several kinds of speech features such as Mel frequency cepstrum coefficient MFCC, pitch, formant, et al., were extracted and combined in different ways to reflect the relationship between feature combinations and emotion recognition performance. We adjusted different parameters in DBN to achieve the best performance when solving different emotions. Both gender dependent and gender independent experiments were conducted on the Chinese Academy of Sciences emotional speech database. The highest accuracy was 94.6i?ź%, which was achieved using multi-feature fusion. The experiment results show that DBN based approach has good potential for practical usage of emotion recognition, and suitable multi-feature fusion will improve the performance of speech emotion recognition.

Full Text