Abstract

It is frequently simple to identify emotions in human-to-human encounters since they may be seen in speech, body language, and facial expressions. Nevertheless, recognising human emotion in human-computer interactions (HCI) can be difficult. Speech emotion recognition (SER), with the aim of just identifying emotions through vocal intonation, has arisen as a way to enhance this connection. In this paper, a SER system based on deep learning methodologies is proposed.Here, RAVDESS dataset was utilised to assess the suggested system. To select the most appropriate vocal features that represent speech emotions for this MFCC is used. LSTM, CNN, and a hybrid model that combines CNN and LSTM are three different deep learning models we used to construct our SER system. By examining these various strategies, We were able to establish the most effective model for properly recognising emotional states from speech data in real-time situations. Overall, this work indicates the usefulness of the suggested deep learning approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call