Speech Emotion Recognition of Merged Features Based on Improved Convolutional Neural Network

Wangyue Peng,Xiaoyu Tang

doi:10.1109/icicsp48821.2019.8958556

Abstract

The correlation between speech and emotions provides the possibility for the realization of psychological forewarning system. Because of the unexpected noise when transferring speech into semantic information, directly obtaining emotion from voice information may perform more efficient and natural in speech emotion recognition. The key of improving the effect of speech emotion recognition is to introduce more reliable features and realized better accuracy. Based on this, 384 dimension hand-crafted features extracted from speech acting as the basic features of speech. At the same time, the multi-dimension learning features obtained from log- mel spectrogram through the improved convolutional neural network combined with long short term memory recurrent neural network will act as the highly features. By applying this method into speech emotion recognition, merged features have been proved to be effective. Compared with the results of only relying on features learning from the log-mel spectrogram, the merged features showed better performance.

Full Text