Speech Emotion Recognition Using Feature Fusion of TEO and MFCC on Multilingual Databases

Syed Asif Ahmad Qadri,Mira Kartiwi,Hasmah Mansor,Taiba Majid Wani,Teddy Surya Gunawan

doi:10.1007/978-981-33-4597-3_61

Abstract

AbstractIn the speech signal, emotion is considered one of the most critical elements. For the recognition of emotions, the field of speech emotion recognition came into existence. Speech Emotion Recognition (SER) is becoming an area of research interest in the last few years. A typical SER system focuses on extracting features such as pitch frequency, formant features, energy-related features, and spectral features from speech, tailing it with a classification quest to foresee different classes of emotion. The critical issue to be addressed for a successful SER system is the emotional feature extraction, which can be solved by using different feature extraction techniques. In this paper, along with Teager Energy Operator (TEO) and Mel Frequency Cepstral Coefficients (MFCC) a trailblazing feature extraction method, a fusion of MFCC and TEO as Teager-MFCC (T-MFCC) is used for the recognition of energy-based emotions. We have used three corpora of emotions in German, English, and Hindi to develop the multilingual SER system. The classification of these energy-based emotions is done by Deep Neural Network (DNN). It is found that TEO achieves a better recognition rate compared to MFCC and T-MFCC.KeywordsSpeech emotion recognitionDeep neural networkMultilingual databaseTEOMFCC

Full Text