Extraction of Novel Features Based on Histograms of MFCCs Used in Emotion Classification from Generated Original Speech Dataset

Muhammet Pakyurek,Umut Uludag,Mahir Atmis,Selman Kulac

doi:10.5755/j01.eie.26.1.25309

Abstract

This paper introduces two significant contributions: one is a new feature based on histograms of MFCC (Mel-Frequency Cepstral Coefficients) extracted from the audio files that can be used in emotion classification from speech signals, and the other – our new multi-lingual and multi-personal speech database, which has three emotions. In this study, Berlin Database (BD) (in German) and our custom PAU database (in English) created from YouTube videos and popular TV shows are employed to train and evaluate the test results. Experimental results show that our proposed features lead to better classification of results than the current state-of-the-art approaches with Support Vector Machine (SVM) from the literature. Thanks to our novel feature, this study can outperform a number of MFCC features and SVM classifier based studies, including recent researches. Due to the lack of our novel feature based approaches, one of the most common MFCC and SVM framework is implemented and one of the most common database Berlin DB is used to compare our novel approach with these kind of approaches.

Highlights

Human-computer interaction systems have been drawing attention increasingly in recent years
In order to increase the accuracy of recognition of the words spoken by human, many of the state-of-the-art automatic speech recognition systems are dedicated to natural language understanding
Various types of classifiers have been used for the task of speech emotion classification: Hidden Markov Model (HMM), Gaussian Mixture Model (GMM), Support Vector Machine (SVM), Artificial Neural Networks (ANN), kNearest Neighbors (k-NN), and many others

Summary

INTRODUCTION

Human-computer interaction systems have been drawing attention increasingly in recent years. In order to increase the accuracy of recognition of the words spoken by human, many of the state-of-the-art automatic speech recognition systems are dedicated to natural language understanding. Emotion classification has a key role in performance improvements for natural language understanding. MFCCs are calculated for all audio files in both of the utilized databases. These are classified based on the type of emotions. One is our novel feature, which is MFCCs representation based on their histograms and other contribution is PAU speech data, whose emotions are labelled and cross-checked by PhD students.

LITERATURE SURVEY

Preprocessing

Feature Extraction

Software Toolbox

Algorithm

MFCCS BASED FEATURE VECTORS

Feature Set 3 Concatenation of Feature Set 1 and Feature Set 2

Database Features

Labelling

EXPERIMENTAL RESULTS

CONCLUSIONS

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Elektronika ir Elektrotechnika	Publication Date: Feb 17, 2020
Citations: 10	License type: cc-by

R Discovery Prime

R Discovery Prime

Extraction of Novel Features Based on Histograms of MFCCs Used in Emotion Classification from Generated Original Speech Dataset

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Elektronika ir Elektrotechnika

Lead the way for us

Similar Papers

Speech Emotion Recognition Using Feature Fusion of TEO and MFCC on Multilingual Databases
Mira Kartiwi ... Teddy Surya Gunawan
-
Mira Kartiwi, et. al.Mira Kartiwi ... Teddy Surya Gunawan
16 Jul 2021
16 Jul 2021

Recognition of Spoken Languages from Acoustic Speech Signals Using Fourier Parameters
L S Kumar ... Niladri Kar
Circuits, Systems, and Signal Processing | VOL. 38
L S Kumar, et. al.L S Kumar ... Niladri Kar
04 Apr 2019
Circuits, Systems, and Signal Processing | VOL. 38

Sound Event Recognition With Probabilistic Distance SVMs
Haizhou Li ... Huy Dat Tran
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 19
Haizhou Li, et. al.Haizhou Li ... Huy Dat Tran
01 Aug 2011
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 19

Investigation of self-supervised pre-trained models for classification of voice quality from speech and neck surface accelerometer signals
Farhad Javanmardi ... Sudarsana Reddy Kadiri
Computer Speech & Language | VOL. 83
Farhad Javanmardi, et. al.Farhad Javanmardi ... Sudarsana Reddy Kadiri
28 Jul 2023
Computer Speech & Language | VOL. 83

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Extraction of Novel Features Based on Histograms of MFCCs Used in Emotion Classification from Generated Original Speech Dataset

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Elektronika ir Elektrotechnika