Abstract

Speech emotion recognition is a task designed to automatically identify human emotions in spoken utterances. The current study focuses on speech emotion recognition based on deep convolutional neural networks (DCNNs) and extremely randomized trees. Specifically, we propose a method based on DCNN, which extracts informative features from the speech signal, and those features are then used by an extremely randomized trees classifier for emotion recognition. The CNNs are a special variant of conventional feed-forward deep neural networks (DNNs), and have been used in many speech applications. Another method is also proposed which integrates DCNN with i-vectors for emotion recognition. The proposed methods were evaluated using the state-of-the-art English IEMOCAP and FAU Aibo German emotional corpora for the recognition of four and five emotions, respectively. When using the IEMOCAP English corpus and DCNN with extremely randomized trees, a 63.9% unweighted average recall (UAR) was obtained. In the case of using the German children’s Aibo corpus, a 61.8% UAR was achieved. These results are very promising showing the effectiveness of the proposed methods in speech emotion recognition. The proposed methods were compared with a baseline approach based on support vector machines (SVM), and they showed superior performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call