Abstract

This paper introduces two significant contributions: one is a new feature based on histograms of MFCC (Mel-Frequency Cepstral Coefficients) extracted from the audio files that can be used in emotion classification from speech signals, and the other – our new multi-lingual and multi-personal speech database, which has three emotions. In this study, Berlin Database (BD) (in German) and our custom PAU database (in English) created from YouTube videos and popular TV shows are employed to train and evaluate the test results. Experimental results show that our proposed features lead to better classification of results than the current state-of-the-art approaches with Support Vector Machine (SVM) from the literature. Thanks to our novel feature, this study can outperform a number of MFCC features and SVM classifier based studies, including recent researches. Due to the lack of our novel feature based approaches, one of the most common MFCC and SVM framework is implemented and one of the most common database Berlin DB is used to compare our novel approach with these kind of approaches.

Highlights

  • Human-computer interaction systems have been drawing attention increasingly in recent years

  • In order to increase the accuracy of recognition of the words spoken by human, many of the state-of-the-art automatic speech recognition systems are dedicated to natural language understanding

  • Various types of classifiers have been used for the task of speech emotion classification: Hidden Markov Model (HMM), Gaussian Mixture Model (GMM), Support Vector Machine (SVM), Artificial Neural Networks (ANN), kNearest Neighbors (k-NN), and many others

Read more

Summary

INTRODUCTION

Human-computer interaction systems have been drawing attention increasingly in recent years. In order to increase the accuracy of recognition of the words spoken by human, many of the state-of-the-art automatic speech recognition systems are dedicated to natural language understanding. Emotion classification has a key role in performance improvements for natural language understanding. MFCCs are calculated for all audio files in both of the utilized databases. These are classified based on the type of emotions. One is our novel feature, which is MFCCs representation based on their histograms and other contribution is PAU speech data, whose emotions are labelled and cross-checked by PhD students.

LITERATURE SURVEY
Preprocessing
Feature Extraction
Software Toolbox
Algorithm
MFCCS BASED FEATURE VECTORS
Feature Set 3 Concatenation of Feature Set 1 and Feature Set 2
Database Features
Labelling
EXPERIMENTAL RESULTS
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call