Speech emotion recognition based on minimal voice quality features

Agnes Jacob

doi:10.1109/iccsp.2016.7754275

Abstract

This paper presents the results of investigations in speech emotion recognition (SER) in English and Hindi based on micro perturbations in pitch, called jitter, as well as very small variations in intensity, called shimmer. Jitter and shimmer are proposed as minimal, reliable and effective features for speech emotion recognition since it is difficult to bring about such minute variations in intensity and pitch artificially, without actually experiencing the emotions. The identification of such a minimal feature set could result in savings of time and effort. It is significant in the present SER scenario where the performance of emotion recognition systems relies on hundreds of features, the collection of which is time consuming. These investigations were conducted on a database of induced emotional speech of females developed exclusively for this purpose. 2765 wave files in English and 2240 wave files in Hindi were statistically analyzed. Multiple classifiers were used for validating the classification results. Maximum overall accuracy of 64.8% for English SER and 83.3% for Hindi SER have been obtained with an ANN classifier when classifying seven different emotions.

Full Text