Abstract
A Speech Emotion Recognition (SER) system is a collection of methods for processing and classifying voice inputs to recognize emotions. This type of system could be beneficial in several sectors, including interactive voice-based assistants and caller-agent conversation analysis. We want to reveal underlying emotions in the recorded speech by analyzing the acoustic features of audio data. The majority of Emotion Recognition research has concentrated on the use of speech descriptors such as mel-frequency cepstral coefficients (MFCC), Linear Prediction Coefficient (LPC), energy, spectral flux, spectral centroid, spectral roll-off, and zero-crossing rate, followed by the application of machine learning classifiers such as SVM, Nave Bayes, and others, or an ensemble of a few such classifiers. In other research papers, the speech recognition problem was turned into an image recognition problem, and then convolutional neural network (CNN) architectures were used, only evaluating MFCC images of audio signals. In our technique, we gathered spectrogram images from audio samples to train our CNN architecture. Spectrograms are graphical representations of the signal strength, or ‘loudness,’ of a signal across time at various frequencies contained in a waveform. We also compared the results with the CNN model applied to this dataset’s MFCC images. When compared to our spectrogram CNN model, the MFCC image CNN model improved by 3.75% (accuracy 82.5%). https://github.com/sambhavi10/Speech-Emotion-Recognition .
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.