Emotional speech classification using Gaussian mixture models

D Ververidis,C Kotropoulos

doi:10.1109/iscas.2005.1465226

Abstract

The classification of utterances into five basic emotional states is studied. A total of 87 statistical characteristics of pitch, energy, and formants is extracted from 500 utterances of the Danish emotional speech database. An evaluation of the classification capability of each feature is performed with respect to the probability of correct classification achieved by the Bayes classifier that models the feature probability density function as a mixture of Gaussian densities. Next, the feature subset that yields the highest probability of correct classification is found using the sequential floating forward selection algorithm. The probability of correct classification is estimated via cross-validation and the probability density functions are modelled as mixtures of 2 or 3 Gaussian densities. The results demonstrate that the Bayes classifier which employs mixtures of 2 Gaussian densities can achieve a probability of correct classification equal to 0.55, whereas the human classification score is 0.67 for the database considered and random classification would give a probability of correct classification equal to 0.20.

Full Text