Abstract

Emotion recognition is one of the widely studied topics in speech technology. Emotions that come from speech can contain useful information for many purposes. The main aspects in speech emotion recognition are speech features, speech corpus, and machine learning algorithms as the classifier method. In this paper, cross-corpus method is used to conduct Indonesian Speech Emotion Recognition (SER) along with the combination of Mel Frequency Cepstral Coefficients (MFCC) and Teager Energy features. Using Support Vector Machine (SVM) as classifier, the experiment result shows that applying cross-corpus method by adding corpora from other languages to the training dataset improves the emotion classification accuracy by 4.16% on MFCC Statistics feature and 2.09% on Teager-MFCC Statistics feature.

Highlights

  • Nowadays we are experiencing a rapid growth on Information Technology (IT) sectors, especially in mobile devices area

  • We achieved the accuracy of 83.33% and 79.17% from testing using the Mel Frequency Cepstral Coefficients (MFCC) Statistics feature for the first and latter scenario, respectively, whereas using Teager-MFCC Statistics feature achieved the accuracy of 85.42% and 83.33% for such scenarios, respectively

  • We can see that applying cross-corpus method by adding corpora from other languages to the training dataset can improve the overall performance of the emotion recognition, including the Indonesian Speech Emotion Recognition (SER)

Read more

Summary

INTRODUCTION

Nowadays we are experiencing a rapid growth on Information Technology (IT) sectors, especially in mobile devices area. One simple application is the virtual assistant will compile a (song) playlist that is comforting the user if there is sad emotion recognized in the speech Because of this high potential of use, it is necessary to further analyze the emotion recognition process itself. The first main topic in this study is the use of cross-corpus method [7] for the Indonesian SER. There are three corpora: one German corpus and two English corpora Another main topic is the combination of two speech features, Mel Frequency Cepstral Coefficients (MFCC) features and Teager Energy features. The features will be combined with Teager Energy features [9] to hopefully achieve better result These speech features are extracted from the corpus and used along with their statistical values.

RELATED WORKS
EXPERIMENTS
Preparing Corpus
Extracting the Speech Features
Configuring Corpus for Training and Testing
Conducting Training and Testing
Testing Result for the Same Corpus
Testing Result for different Corpus
Analysis
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call