Speech Emotion Recognition of Indonesian Movie Audio Tracks based on MFCC and SVM

Muljono Muljono,Muhammad Ramadhan Prasetya,Catur Supriyanto,Agus Harjoko

doi:10.1109/ic3i46837.2019.9055509

Abstract

Emotion speech recognition becomes the important part of signal processing research area. Many useful applications have been supported by emotion speech recognition. This study aims to investigate the performance of mel-frequency cepstral coefficients (MFCC) on Indonesian speech emotion recognition. The dataset is Indonesian movies audio tracks which collected from the internet. Some preprocessing are performed to split the audio from the movie. The audio tracks were selected and classified into four emotion classes, i.e., angry, sad, happy, and neutral. Support Vector Machines (SVM) is used to recognise the emotion of Indonesian speech. Both MFCC and SVM methods are the most commonly-used feature extraction and classifier methods in speech recognition. The performance of MFCC is compared on several SVM kernel functions, such as linear kernel, polynomial kernel, and radial basis function (RBF). Based on the results, SVM with linear kernel achieves the highest accuracy of 66% compared to SVM with polynomial kernel that produces the accuracy of 45%.

Full Text