Improvement on Speech Emotion Recognition Based on Deep Convolutional Neural Networks

Yafeng Niu,Hua Tan,Yadong Niu,Zhongshi He,Dongsheng Zou

doi:10.1145/3194452.3194460

Abstract

Speech emotion recognition (SER) is to study the formation and change of speaker's emotional state from the speech signal perspective, so as to make the interaction between human and computer more intelligent. SER is a challenging task that has encountered the problem of less training data and low prediction accuracy. Here we propose a data processing algorithm based on the imaging principle of the retina and convex lens (DPARIP), to acquire the different sizes of spectrogram and get different training data by changing the distance between the spectrogram and the convex lens. Meanwhile, with the help of deep learning to get the high-level features, we apply the AlexNet on the IEMOCAP database and achieve the average accuracy over 48.8% on six emotions. The experimental results indicate that our proposed data preprocessing algorithm is effective and more accurate compared to existing emotion recognition algorithms.

Full Text