Abstract
Over the years, emotion recognition has become one of the cornerstones of affective computing, enabling machines to recognize and be responsive to human emotions. The current study will present approaches to accurately classify emotions by exploiting multimodal data, namely audio and text. Challenges faced in this domain are the noisy speech signals and inherently ambiguous textual expressions that generally reduce the accuracy of unimodal systems. Classic approaches cannot make good use of the complementary nature of these modalities and, therefore, require a robust and combined framework. This study proposes a method called SVM-ERATI, Support Vector Machine (SVM) based emotion recognition (ER) approach that inputs audio and text information (ATI). Extracted audio features in this regard will include Mel-frequency cepstral coefficients (MFCCs) and prosody-like pitch and energy related to the acoustic properties of emotions. Meanwhile, semantic embeddings obtained from transformer models like BERT serve to analyze text data. A feature-level fusion scheme is then followed, whereby the feature vectors from both audio and text are combined into an integrated representation. Then, features after fusion will be classified by the multi-class SVM with a proper radial basis function (RBF) kernel function that is most appropriate to capture the non-linear relationships inherent in the multimodal emotional data. Experiments on benchmark datasets such as CMU-MOSEI demonstrate that the proposed multimodal approach using SVM significantly outperforms unimodal baselines by about 12%. The findings highlight SVM's effectiveness in combining audio and text data for emotion recognition, which has exciting implications for AI in AI-powered mental health diagnostics and AI-powered intelligent virtual assistants.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.