Abstract

Affective computing can help us achieve more intelligent user interfaces by adding the ability to recognize users’ emotions. Human speech contains information about the emotional state of the speaker and can be used in emotion recognition systems. In this paper, we present a machine learning approach using acoustic features which improves the accuracy of speech emotion recognition. We used 698 speech samples from ”Emotional Prosody Speech and transcripts” corpus to train and test the classifiers. The emotions used were happy, sadness, hot anger, panic, and neutral. Mel-frequency Cepstral Coefficients (MFCC), Teager Energy Operator (TEO) features, and acoustic landmark features were extracted from speech samples. Models were trained using multinomial logistic regression, k-Nearest Neighbors(k-NN) and Support Vector Machine(SVM) classifiers. The results show that adding landmark and TEO features to MFCC features improves the accuracy of classification. SVM classifiers with a Gaussian kernel had the best performance with an average accuracy of 90.43%. We achieved significant improvement in the accuracy of the classification compared to a previous study using the same dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call