Abstract

Speech emotion recognition (SER) stands as a prominent and dynamic research field in data science due to its extensive application in various domains such as psychological assessment, mobile services, and computer games, mobile services. In previous research, numerous studies utilized manually engineered features for emotion classification, resulting in commendable accuracy. However, these features tend to underperform in complex scenarios, leading to reduced classification accuracy. These scenarios include: 1. Datasets that contain diverse speech patterns, dialects, accents, or variations in emotional expressions. 2. Data with background noise. 3. Scenarios where the distribution of emotions varies significantly across datasets can be challenging. 4. Combining datasets from different sources introduce complexities due to variations in recording conditions, data quality, and emotional expressions. Consequently, there is a need to improve the classification performance of SER techniques. To address this, a novel SER framework was introduced in this study. Prior to feature extraction, signal preprocessing and data augmentation methods were applied to augment the available data, resulting in the derivation of 18 informative features from each signal. The discriminative feature set was obtained using feature selection techniques which was then utilized as input for emotion recognition using the SAVEE, RAVDESS, and EMO-DB datasets. Furthermore, this research also implemented a cross-corpus model that incorporated all speech files related to common emotions from three datasets. The experimental outcomes demonstrated the superior performance of SER framework compared to existing frameworks in the field. Notably, the framework presented in this study achieved remarkable accuracy rates across various datasets. Specifically, the proposed model obtained an accuracy of 95%, 94%,97%, and 97% on SAVEE, RAVDESS, EMO-DB and cross-corpus datasets respectively. These results underscore the significant contribution of our proposed framework to the field of SER.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.