MFCC feature with optimized frequency range: An essential step for emotion recognition

Subhasmita Sahoo,Aurobinda Routray

doi:10.1109/icsmb.2016.7915112

Abstract

One of the major challenge in human emotion recognition is extraction of features containing maximum prosodic information. The accuracy of entire emotion detection system eventually relies upon the efficiency of the selected feature. When it comes to identifying emotions from voice, ambiguity in detection can never be completely avoided due to several reasons. Exclusion of redundant information to reduce confusion in recognizing emotions is quite challenging. The primary objective of this work is to improve the accuracy of existing emotion recognition method that uses Mel frequency Cepstral Coefficient (MFCC) feature. In this work, an additional step has been introduced to the method to make it more efficient for recognizing emotions from voice. Instead of taking the whole signal frequency range for filter bank analysis in MFCC computation, it has been suggested to optimize the analysis frequency range for maximum accuracy. The proposed method has been tested on two standard speech emotion databases: Berlin Emo-DB database [1] and Assamese database [2]. The addition of this extra step has been found to be increasing speaker-independent emotion recognition accuracy by 15% for Assamese database and around 25% for Berlin database.

Full Text