Abstract

In the past decade, research for improving man–machine communication has focused on emotion recognition using audio cues. Several effective monolingual, multilingual, and cross-corpus speech emotion recognition (SER) systems have been developed; however, they are limited to recognizing emotions from databases of monolingual discourse, primarily in either categorical or dimensional emotion space. For multilingual countries and federations such as India, Russia, and the European Union, these limitations can be problematic. Furthermore, in an environment of mixed diversified languages, the performance of existing models is unclear. To address these issues, we propose an innovative mixed-lingual SER system that considers five diverse languages, including dialect variability. Mixed-lingual corpora are developed from available standard speech emotion databases. Furthermore, a compact feature set having a unique set of speech feature functionals with a distinctive set of enhanced perceptual features and modified H-coefficients is proposed. Against existing large feature sets, the proposed compact feature set is robust and effective to perform the dual task of significantly recognizing different emotions from multilingual SER systems also, along with the mixed-lingual SER systems in both emotion spaces of categorical and dimensional. In the proposed SER system, to overcome the skewness of SER system performance for recognizing certain emotions, a data augmentation method is then incorporated. Furthermore, the proposed SER system is designed to efficiently recognize even the extreme emotions of boredom, disgust, sadness, and surprise in both emotion spaces. The proposed SER system is compared to existing SER systems, and the comparison results demonstrate that the proposed system outperformed existing systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call