Abstract
Determining the emotion of a speaker using their speech utterance by a machine is referred to as Speech Emotion Recognition System. It can greatly enhance the human and machine interaction experience. However, the system faces poor performance due to factors like variations in emotion intensity, speaker, language, and culture. In this study, we have used RAVDESS dataset consisting of speech utterances with two emotional intensities, namely, strong, and normal, raising the recognition difficulty level for the development of an efficient framework for the Speech Emotion Recognition System (SER). For SER, Gender Dependent Training for building the emotion detection models in the Speech Emotion Recognition System is proposed in this research work. The proposed system is less complex and is able to demonstrate good performance using only MFCC features and its variants, namely, delta MFCC and delta-delta MFCC features when compared with the baseline system, which has utilized five different features, namely, MFCC, Mel, Chromogram, Spectral Contrast and Tonnetz. When the average of the speech emotion accuracy of 6 emotions, such as, Sad, Fearful, Calm, Surprised, Disgust and Happy are considered, the proposed system has shown a relatively improved result by 6.90 % over the considered baseline system.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.