Abstract
Abstract As a hot research field, speech emotion recognition has attracted increasing attentions from both academic and business. In this paper, we proposed a method to recognize speech emotions adopting ANNs and to fuse two kinds of recognitions using different features at the decision level. Each emotional utterance is recognized by some individual recognizers firstly. Then the outputs of these recognizers were fused adopting the voting strategy. Furthermore, the dimensionality of supervectors constructed from spectral features is reduced through PCA. Experimental results demonstrated that the proposed decision fusion is effective and the dimensionality reduction is feasible. Index Terms : speech emotion recognition, ANN, decision fusion 1. Introduction Speech is a dominant tool for communication, and it is also an important and effective approach for transmitting information and human emotions. With the increasing role of speech interfaces in human-machine interaction applications, speech emotion recognition becomes more and more important recently. Speech emotion recognition is an interesting and challenging speech technology, which can be applied to broad areas, such as environment of call center [1], treatment of mental and psychological diseases [2], development of education and entertainment software [3], and so on. Speech emotion recognition deals with how to make the computer automatically recognize various emotions in speech signal by extracting and analyzing some acoustic features. A key problem of speech emotion recognition is that which kinds of speech features can be used to represent human emotions. Some researchers have investigated the relations between features and emotions. With their efforts, many speech features were found to be used for emotion recognition. Statistical features based on prosody and voice quality have been widely used in speech emotion recognition and demonstrated considerable recognition success [4, 5]. Besides statistical features, spectral or cepstral features are another effective group for describing emotional states [6, 7]. Since these features all have played significant roles in speech emotion recognition, it is necessary to explore an effective way to complementarily fuse both two kinds of features to further enhance the performance of emotion recognition.Another key issue of speech emotion recognition is how to choose an effective method to classify speech emotions. So far, many pattern classification methods have been used for speech emotion recognition [6-9], such as Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Gaussian Mixture Model (GMM), Hidden Markov Model (HMM), Artificial Neural Networks (ANN), and so on. These methods are all feasible, but their performances are different with each other seriously. The SVMs based method has been shown to be robust and performs well. But some fresh researches have indicated that ideal performance may be obtained using ANNs as well. However, it is difficult to determine which kind of ANNs is suitable for emotion recognition and it is necessary to compare its performance with the SVMs. In this paper, the ANN based decision fusion for speech emotion recognition was presented. Firstly four different ANNs were used to recognize various emotions. Then the voting scheme was adopted to fuse recognitions using two kinds of features at the decision level. Experimental results demonstrated that the proposed approach improved the performance of ANN based recognition and its accuracy was comparable with SVM based method.The remainder of this paper is organized as follows. The features used for speech emotion recognition are introduced in Section 2. The principles of PCA and ANN are briefly described in Section 3 and Section 4 respectively. The proposed decision fusion is depicted in Section 5. In Section 6, experiments and discussions of experimental results are presented. In Section 7, conclusions are drawn and future works are suggested.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.