HYBRID FUSION OF FACE AND SPEECH INFORMATION FOR BIMODAL EMOTION ESTIMATION

Krishna Mohan Kudiri

doi:10.11113/jt.v78.9538

Abstract

Estimation of human emotions during a conversation is difficult using a computer. In this study, facial expressions and speech are used in order to estimate emotions (angry, sad, happy, boredom, disgust and surprise). A proposed hybrid system through facial expressions and speech is used to estimate emotions of a person when he is engaged in a conversational session. Relative Bin Frequency Coefficients and Relative Sub-Image-Based features are used for acoustic and visual modalities respectively. Support Vector Machine is used for classification. This study shows that the proposed feature extraction through acoustic and visual data is the most prominent aspect affecting the emotion detection system, along with the proposed fusion technique. Although some other aspects are considered to be affecting the system, the effect is relatively minor. It was observed that the performance of the bimodal system was lower than the unimodal system through deliberate facial expressions. In order to deal with the problem, a suitable database is used. The results indicate that the proposed system showed better performance, with respect to basic emotional classes than the rest.

Full Text