An improved gaussian mixture hidden conditional random fields model for audio-based emotions classification

Muhammad Hameed Siddiqi

doi:10.1016/j.eij.2020.03.001

Abstract

The analysis of human emotions plays a significant role in providing sufficient information about patients in monitoring their feelings for better management of their diseases. Audio-based emotions recognition has become a fascinating research interest for such domains during the last decade. Mostly, audio-based emotions systems depend on the recognition stage. The existing model has a common issue called objectivity suppositions problem, which might decrease the recognition rate. Therefore, this study investigates the improved version of a classifier that is based on hidden conditional random fields (HCRFs) model to classify emotional speech. In this model, we introduced a novel methodology that will incorporate multifaceted dissemination with the help of employing a combination of complete covariance Gaussian concreteness function. Due to this incorporation, the proposed model tackle most of the limitations of existing classifiers. Some of the well-known features like Mel-frequency cepstral coefficients (MFCC) are extracted in our experiments. The proposed model has been validated and evaluated on two publicly available datasets likes Berlin Database of Emotional Speech (Emo-DB) and the eNTER FACE’05 Audio-Visual Emotion dataset. For validation and comparison against the existing techniques, we utilized 10-fold cross validation scheme. The proposed method achieved significant improvement under the p-value <0.03 for classification. Moreover, we also prove that computational wise, our computation technique is less expensive against state of the art works.

Full Text