Abstract

Speech Emotion Recognition (SER) has advanced considerably during the past 20 years. Till date, various SER systems have been developed for monolingual, multilingual and cross corpus contexts. However, in a country like India where numerous languages are spoken and often humans converse in more than one language, a dedicated SER system for mixed-lingual scenario is more crucial to be established which is the focus of this work. A self-recorded database that includes speech emotion samples with 11 diverse Indian languages has been developed. In parallel, a mixed-lingual database is formed with three popular standard databases of Berlin, Baum and SAVEE to represent mixed-lingual environment for western background. A detailed investigation of GeMAPS (Geneva Minimalistic Acoustic Parameter Set) feature set for mixed-lingual SER is performed. A distinct set of MFCC (Mel Frequency Cepstral Coefficients) coefficients derived from sine and cosine-based filter banks enriches the GeMAPS feature set and are proven to be robust for mixed-lingual emotion recognition. Various Machine Learning (ML) and Deep Learning (DL) algorithms have been applied for emotion recognition. The experimental results demonstrate GeMAPS features classified from ML has been quite robust for recognizing all the emotions across the mixed-lingual database of the western languages. However, with diverse recording conditions and languages of the Indian self-recorded database the GeMAPS with enriched features and classified using DL are proven to be significant for mixed-lingual emotion recognition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call