Investigation of the effect of spectrogram images and different texture analysis methods on speech emotion recognition

Turgut Özseven

doi:10.1016/j.apacoust.2018.08.003

Abstract

Emotional state detection is an important part of human-machine interaction studies. The features used in emotion recognition are derived from the changes in facial mimics and speech signals. In emotion recognition from facial expressions, facial expressions are processed by image processing methods. If emotion recognition is performed via speech, speech is digitized by signal processing methods, and various features of speech are obtained by acoustic analysis. However, since the change in the features obtained by acoustic analysis is different according to emotion, the general success rate is changing. To overcome this limitation, the study of the effect of spectrogram images on emotional recognition is a current field of study. The purpose of this study is to investigate the effects of texture analysis methods and spectrogram images on speech emotion recognition. For this purpose, spectrogram images of speech were processed by four different texture analysis methods to obtain feature sets. The success rates for the emotion recognition of the obtained feature sets were experimentally investigated using support vector machines. In addition, the success of texture analysis methods was compared with acoustic analysis methods. The results have shown that texture analysis methods can be used for speech emotion recognition. When the results of the texture analysis methods were compared with those of the acoustic analysis, the texture analysis methods resulted in a 0.4% reduction in emotion recognition success rate. However, the combined use of both methods increased the success rate.

Full Text