Recognizing emotions in spoken dialogue with hierarchically fused acoustic and lexical features

Leimin Tian,Johanna Moore,Catherine Lai

doi:10.1109/slt.2016.7846319

Abstract

Automatic emotion recognition is vital for building natural and engaging human-computer interaction systems. Combining information from multiple modalities typically improves emotion recognition performance. In previous work, features from different modalities have generally been fused at the same level with two types of fusion strategies: Feature-Level fusion, which concatenates feature sets before recognition; and Decision-Level fusion, which makes the final decision based on outputs of the unimodal models. However, different features may describe data at different time scales or have different levels of abstraction. Cognitive Science research also indicates that when perceiving emotions, humans use information from different modalities at different cognitive levels and time steps. Therefore, we propose a Hierarchical fusion strategy for multimodal emotion recognition, which incorporates global or more abstract features at higher levels of its knowledge-inspired structure. We build multimodal emotion recognition models combining state-of-the-art acoustic and lexical features to study the performance of the proposed Hierarchical fusion. Experiments on two emotion databases of spoken dialogue show that this fusion strategy consistently outperforms both Feature-Level and Decision-Level fusion. The multimodal emotion recognition models using the Hierarchical fusion strategy achieved state-of-the-art performance on recognizing emotions in both spontaneous and acted dialogue.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Recognizing emotions in spoken dialogue with hierarchically fused acoustic and lexical features

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Dec 1, 2016
Citations: 92	License type: other-oa

Similar Papers

Recognizing emotions in dialogues with acoustic and lexical features
Leimin Tian ... Johanna D Moore
-
Leimin Tian, et. al.Leimin Tian ... Johanna D Moore
01 Sep 2015
01 Sep 2015

Feature-level and decision-level fusion of noncoincidently sampled sensors for land mine detection
A.H Gunatilaka ... B.A Baertlein
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 23
A.H Gunatilaka, et. al.A.H Gunatilaka ... B.A Baertlein
01 Jun 2001
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 23

Comparative Study on Feature, Score and Decision Level Fusion Schemes for Robust Multibiometric Systems
Chia Chin Lip ... Dzati Athiar Ramli
-
Chia Chin Lip, et. al.Chia Chin Lip ... Dzati Athiar Ramli
01 Jan 2012
01 Jan 2012

Multi-modal Emotion Recognition Based on Speech and Image
Yongqiang Li ... Qi He
-
Yongqiang Li, et. al.Yongqiang Li ... Qi He
01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Recognizing emotions in spoken dialogue with hierarchically fused acoustic and lexical features

Abstract

Talk to us

Similar Papers