Implementation and Comparison of Speech Emotion Recognition System Using Gaussian Mixture Model (GMM) and K- Nearest Neighbor (K-NN) Techniques

Rahul B Lanjewar,Nilesh Patel,Swarup Mathurkar

doi:10.1016/j.procs.2015.04.226

Rahul B Lanjewar, Nilesh Patel + Show 1 more

Open Access

https://doi.org/10.1016/j.procs.2015.04.226

Copy DOI

Abstract

Abstract The kinship between man and machines has become a new trend of technology such that machines now have to respond by considering the human emotional levels. The signal processing and machine learning technologies have boosted the machine intelligence that it gained the capability to understand human emotions. Incorporating the aspects of speech processing and pattern recognition algorithms an intelligent and emotions specific man-machine interaction can be achieved which can be harnessed to design a smart and secure automated home as well as commercial application. This paper emphasizes on implementation of speech emotion recognition system by utilizing the spectral components of Mel Frequency Cepstrum Coefficients (MFCC), wavelet features of speech and the pitch of vocal traces. The different machine learning algorithms used for the classification are Gaussian Mixture Model (GMM) and K- Nearest Neighbour (K-NN) models for the recognition of six emotional categories namely happy, angry, neutral, surprised, fearful and sad from the standard speech database Berlin emotion database (BES) followed by the comparison of the two algorithms for performance analysis which is supported by the confusion matrix.

Full Text