Abstract

Speech emotion recognition is the indispensable requirement for efficient human machine interaction. Most modern automatic speech emotion recognition systems use Gaussian mixture models (GMM) and Support Vector Machines (SVM). GMM are known for their performance and scalability in the spectral modeling while SVM are known for their discriminatory power. A GMM-supervector characterizes an emotional style by the GMM parameters (mean vectors, covariance matrices, and mixture weights). GMM-supervector SVM benefits from both GMM and SVM frameworks. In this paper, the GMM-UBM mean interval (GUMI) kernel based on the Bhattacharyya distance is successfully used. CFSSubsetEval combined with Best first algorithm and Greedy stepwise were also utilized on the supervectors space in order to select the most important features. This framework is illustrated using Mel-frequency cepstral (MFCC) coefficients and Perceptual Linear Prediction (PLP) features on two different emotional databases namely the Surrey Audio-Expressed Emotion and the Berlin Emotional speech Database.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call