Abstract
Speech emotion recognition plays a crucial role in analyzing psychological disorders, behavioral decision‐making, and human‐machine interaction applications. However, the majority of current methods for speech emotion recognition heavily rely on data‐driven approaches, and the scarcity of emotion speech datasets limits the progress in research and development of emotion analysis and recognition. To address this issue, this study introduces a new English speech dataset specifically designed for emotion analysis and recognition. This dataset consists of 5503 voices from over 60 English speakers in different emotional states. Furthermore, to enhance emotion analysis and recognition, fast Fourier transform (FFT), short‐time Fourier transform (STFT), mel‐frequency cepstral coefficients (MFCCs), and continuous wavelet transform (CWT) are employed for feature extraction from the speech data. Utilizing these algorithms, the spectrum images of the speeches are obtained, forming four datasets consisting of different speech feature images. Furthermore, to evaluate the dataset, 16 classification models and 19 detection algorithms are selected. The experimental results demonstrate that the majority of classification and detection models achieve exceptionally high recognition accuracy on this dataset, confirming its effectiveness and utility. The dataset proves to be valuable in advancing research and development in the field of emotion recognition.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.