Abstract
In this paper, we propose a speech emotion recognition system using both spectral and prosodic features. Most traditional systems have focused on spectral features or prosodic features. Since both the spectral and the prosodic features contain emotion information, it is believed that the combining of spectral features and prosodic features will improve the performance of the emotion recognition system. Therefore, we propose to use both spectral and prosodic features. For spectral features, a GMM super vector based SVM is applied with them. For prosodic features, a set of prosodic features that are clearly correlated with speech emotional states and SVM is also used for emotion recognition. The combination of both spectral features and prosodic features is posed as a data fusion problem to obtain the final decision. Experimental results show that the combining of both spectral features and prosodic features yields the emotion error reduction rate of 18.0% and 52.8%, over using only spectral and prosodic features.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.