Emotion recognition from audio-visual data using rule based decision level fusion

Subhasmita Sahoo,Aurobinda Routray

doi:10.1109/techsym.2016.7872646

Abstract

Emotion recognition systems aim at identifying emotions of human subjects from underlying data with acceptable accuracy. Audio and visual signals, being the primary modalities of human emotion perception, have attained the most attention in developing intelligent systems for natural interaction. The emotion recognition system must automatically identify the human emotional states from his or her voice and facial image, unaffected by all possible constraints. In this work, an audio-visual emotion recognition system has been developed that uses fusion of both the modalities at the decision level. At first, separate emotion recognition systems that use speech and facial expressions were developed and tested separately. The speech emotion recognition system was tested on two standard speech emotion databases: Berlin EMODB database and Assamese database. The efficiency of visual emotion recognition system was analyzed using the eNTREFACE'05 database. Then a decision rule was set for fusion of both audio and visual information at the decision level to identify emotions. The proposed multi-modal system has been tested on the same eNTERFACE'05 database.

Full Text