Audiovisual Emotion Recognition Using Entropy-estimation-based Multimodal Information Fusion

Zhibing Xie

doi:10.32920/ryerson.14668260.v1

Abstract

Understanding human emotional states is indispensable for our daily interaction, and we can enjoy more natural and friendly human computer interaction (HCI) experience by fully utilizing human’s affective states. In the application of emotion recognition, multimodal information fusion is widely used to discover the relationships of multiple information sources and make joint use of a number of channels, such as speech, facial expression, gesture and physiological processes. This thesis proposes a new framework of emotion recognition using information fusion based on the estimation of information entropy. The novel techniques of information theoretic learning are applied to feature level fusion and score level fusion. The most critical issues for feature level fusion are feature transformation and dimensionality reduction. The existing methods depend on the second order statistics, which is only optimal for Gaussian-like distributions. By incorporating information theoretic tools, a new feature level fusion method based on kernel entropy component analysis is proposed. For score level fusion, most previous methods focus on predefined rule based approaches, which are usually heuristic. In this thesis, a connection between information fusion and maximum correntropy criterion is established for effective score level fusion. Feature level fusion and score level fusion methods are then combined to introduce a two-stage fusion platform. The proposed methods are applied to audiovisual emotion recognition, and their effectiveness is evaluated by experiments on two publicly available audiovisual emotion databases. The experimental results demonstrate that the proposed algorithms achieve improved performance in comparison with the existing methods. The work of this thesis offers a promising direction to design more advanced emotion recognition systems based on multimodal information fusion and has great significance to the development of intelligent human computer interaction systems.

Highlights

1.1 BackgroundRecognition of emotional states can help us estimate the desire and future behavior of a person
We propose a new dual-level framework of multimodal information fusion which consists of feature level fusion module based on kernel entropy component analysis and score level fusion module based on maximum correntropy criterion
Taking into consideration of the limitations of existing predefined rule fusion methods, we propose a novel approach based on maximum correntropy criterion for score level fusion

Summary

Introduction

1.1 BackgroundRecognition of emotional states can help us estimate the desire and future behavior of a person. The audio features are combined to construct a joint feature vector, but high dimensional feature set may suffer from the problem of data sparseness, and stress the computational resources To solve this disadvantage, a feature level fusion method based on kernel entropy component analysis is explored for audio emotion recognition. In order to improve the recognition performance at score level, a new score level fusion method based on information theoretic tools, maximum correntropy criterion (MCC) in particular, is proposed. Most of the existing methods have not taken full advantage of intrinsic characteristics of the matching score from different modalities [119] Their performances are usually degraded due to lack of sufficient training data and noisy training samples.

Objectives

Methods

Results

Conclusion