Abstract
Human being most often shows their emotions through their speech, and detecting emotions from the speech is a crucial task where machine learning plays a significant role. In this paper, comparison and application of traditional machine learning models and deep learning models were conducted using spectral features like Mel-frequency cepstral coefficients on combined dataset of multiple audio files resources such as RAVDESS, TESS, and SAVEE. Using Random Forest Classifiers, the overall accuracy for predicting emotion classes is 86.3 percent and using Boosting Ensemble, we achieve 85.8 percent. However, deep learning techniques like LSTM and CNN are also applied and compared with traditional machine learning techniques, they achieve around 75 percent overall accuracy.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have