Abstract

One of the most stimulating tasks is speech emotion recognition. In our work, to make use of sound files as input for machine learning algorithm (SVM) and deep learning algorithm (CNN-1D,2D), for identification of emotions using sample audio files from the Berlin dataset (EMO-DB), we introduced an architecture that excerpts the MFCC and Mel-spectrogram from those audio files. To achieve an improvement in classification performance, we applied an incremental methodology to our initial model. Both machine learning and deep learning models can work directly with raw data. Based on our tentative results, the new development set is initiated for our existing framework of EMO-DB by the best-performing model. The proposed framework obtained 92.5% performance with 535 samples of EMO-DB in 7 classes. It is one of the desirable ways to recognize emotions. We used a neural network in deep learning for better results, which helps to extract the right feature from audio files for emotion classification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call