Abstract
This paper introduces two significant contributions: one is a new feature based on histograms of MFCC (Mel-Frequency Cepstral Coefficients) extracted from the audio files that can be used in emotion classification from speech signals, and the other – our new multi-lingual and multi-personal speech database, which has three emotions. In this study, Berlin Database (BD) (in German) and our custom PAU database (in English) created from YouTube videos and popular TV shows are employed to train and evaluate the test results. Experimental results show that our proposed features lead to better classification of results than the current state-of-the-art approaches with Support Vector Machine (SVM) from the literature. Thanks to our novel feature, this study can outperform a number of MFCC features and SVM classifier based studies, including recent researches. Due to the lack of our novel feature based approaches, one of the most common MFCC and SVM framework is implemented and one of the most common database Berlin DB is used to compare our novel approach with these kind of approaches.
Highlights
Human-computer interaction systems have been drawing attention increasingly in recent years
In order to increase the accuracy of recognition of the words spoken by human, many of the state-of-the-art automatic speech recognition systems are dedicated to natural language understanding
Various types of classifiers have been used for the task of speech emotion classification: Hidden Markov Model (HMM), Gaussian Mixture Model (GMM), Support Vector Machine (SVM), Artificial Neural Networks (ANN), kNearest Neighbors (k-NN), and many others
Summary
Human-computer interaction systems have been drawing attention increasingly in recent years. In order to increase the accuracy of recognition of the words spoken by human, many of the state-of-the-art automatic speech recognition systems are dedicated to natural language understanding. Emotion classification has a key role in performance improvements for natural language understanding. MFCCs are calculated for all audio files in both of the utilized databases. These are classified based on the type of emotions. One is our novel feature, which is MFCCs representation based on their histograms and other contribution is PAU speech data, whose emotions are labelled and cross-checked by PhD students.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have