Speech Emotion Recognition Using Convolutional Neural Networks on Spectrograms and Mel-frequency Cepstral Coefficients Images

Sambhavi Mukherjee,Shikha Mundra,Ankit Mundra

doi:10.1007/978-981-19-9304-6_4

Abstract

A Speech Emotion Recognition (SER) system is a collection of methods for processing and classifying voice inputs to recognize emotions. This type of system could be beneficial in several sectors, including interactive voice-based assistants and caller-agent conversation analysis. We want to reveal underlying emotions in the recorded speech by analyzing the acoustic features of audio data. The majority of Emotion Recognition research has concentrated on the use of speech descriptors such as mel-frequency cepstral coefficients (MFCC), Linear Prediction Coefficient (LPC), energy, spectral flux, spectral centroid, spectral roll-off, and zero-crossing rate, followed by the application of machine learning classifiers such as SVM, Nave Bayes, and others, or an ensemble of a few such classifiers. In other research papers, the speech recognition problem was turned into an image recognition problem, and then convolutional neural network (CNN) architectures were used, only evaluating MFCC images of audio signals. In our technique, we gathered spectrogram images from audio samples to train our CNN architecture. Spectrograms are graphical representations of the signal strength, or ‘loudness,’ of a signal across time at various frequencies contained in a waveform. We also compared the results with the CNN model applied to this dataset’s MFCC images. When compared to our spectrogram CNN model, the MFCC image CNN model improved by 3.75% (accuracy 82.5%). https://github.com/sambhavi10/Speech-Emotion-Recognition .

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Speech Emotion Recognition Using Convolutional Neural Networks on Spectrograms and Mel-frequency Cepstral Coefficients Images

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

An ensemble 1D-CNN-LSTM-GRU model with data augmentation for speech emotion recognition
Md Rayhan Ahmed ... Swakkhar Shatabda
Expert Systems with Applications | VOL. 218
Md Rayhan Ahmed, et. al.Md Rayhan Ahmed ... Swakkhar Shatabda
01 Feb 2023
Expert Systems with Applications | VOL. 218

Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.
Tursunov Anvarjon ... Soonil Kwon
Sensors | VOL. 20
Tursunov Anvarjon, et. al.Tursunov Anvarjon ... Soonil Kwon
12 Sep 2020
Sensors | VOL. 20

Emotional speech Recognition using CNN and Deep learning techniques
C Hema ... Fausto Pedro Garcia Marquez
Applied Acoustics | VOL. 211
C Hema, et. al.C Hema ... Fausto Pedro Garcia Marquez
28 Jun 2023
Applied Acoustics | VOL. 211

Speech emotion recognition based on convolutional neural network
Chen Jie
-
Chen JieChen Jie
01 Dec 2021
01 Dec 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Speech Emotion Recognition Using Convolutional Neural Networks on Spectrograms and Mel-frequency Cepstral Coefficients Images

Abstract

Talk to us

Similar Papers