Abstract

In the evolving field of Speech Emotion Recognition (SER), essential for understanding and addressing mental health issues, conventional models often falter in interpreting complex emotional states, particularly those related to mental health conditions like PTSD. This study introduces the Cognitive Emotion Fusion Network (CEFNet), a novel hybrid SER model integrating Improved and Faster Region-based Convolutional Neural Networks (IFR-CNN), Deep Convolutional Neural Networks (DCNNs), Deep Belief Networks (DBNs), and the Bird's Nest Learning Analogy (BNLA). Aimed at surpassing the limitations of traditional models, CEFNet focuses on accurately interpreting nuanced emotional expressions, employing advanced machine learning techniques and comprehensive feature extraction. Evaluated using the EMODB and RAVDESS datasets, CEFNet demonstrated superior performance, achieving an accuracy of 98.11% and 91.17% on these datasets, respectively, outperforming existing models in precision and F1 scores. This research marks a significant contribution to SER, particularly in mental health applications, offering a robust framework for emotion recognition in speech. It opens avenues for future enhancements, including broader applicability across languages and cultural contexts, optimization for resource-limited environments, and integration with other modalities for more holistic emotion recognition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call