Speech Emotion Recognition Using Deep Neural Networks on Multilingual Databases

Syed Asif Ahmad Qadri,Taiba Majid Wani,Eliathamby Ambikairajah,Teddy Surya Gunawan,Eko Ihsanto,Mira Kartiwi

doi:10.1007/978-3-030-70917-4_3

Abstract

The research community’s ever-increasing interest in studying human-computer interactions (HCI), systems deducing, and identifying a speech signal’s emotional aspects has emerged as a hot research topic. Speech Emotion Recognition (SER) has brought the development of automated and intelligent analysis of human utterances to reality. Typically, an SER system focuses on extracting the features from speech signals such as pitch frequency, formant features, energy-related and spectral features, tailing it with a classification quest to understand the underlying emotion. The key issues pivotal for a successful SER system are driven by the proper selection of proper emotional feature extraction techniques. In this paper, Mel-frequency Cepstral Coefficient (MFCC) and Teager Energy Operator (TEO) along with a new proposed Feature Fusion of MFCC and TEO referred to as Teager-MFCC (TMFCC) is examined over a multilingual database consisting of English, German and Hindi languages. Deep Neural Networks have been used to classify the different emotions considered, happy, sad, angry, and neutral. Evaluation results show that the proposed fusion TMFCC with a recognition rate of 92.7% outperforms TEO and MFCC. With TEO and MFCC configurations, the recognition rate has been found as 88.5% and 90.0%, respectively.KeywordsSpeech Emotion RecognitionMel-frequency Cepstral Coefficient (MFCC)Teager Energy Operator (TEO)Deep Neural Networks (DNN)

Full Text