Mutual Correlation Attentive Factors in Dyadic Fusion Networks for Speech Emotion Recognition.

Yue Gu,Shuhong Chen,Xinyu Li,Xinyu Lyu,Weijia Sun,Weitian Li,Marsic Ivan

doi:10.1145/3343031.3351039

Abstract

Emotion recognition in dyadic communication is challenging because: 1. Extracting informative modality-specific representations requires disparate feature extractor designs due to the heterogenous input data formats. 2. How to effectively and efficiently fuse unimodal features and learn associations between dyadic utterances are critical to the model generalization in actual scenario. 3. Disagreeing annotations prevent previous approaches from precisely predicting emotions in context. To address the above issues, we propose an efficient dyadic fusion network that only relies on an attention mechanism to select representative vectors, fuse modality-specific features, and learn the sequence information. Our approach has three distinct characteristics: 1. Instead of using a recurrent neural network to extract temporal associations as in most previous research, we introduce multiple sub-view attention layers to compute the relevant dependencies among sequential utterances; this significantly improves model efficiency. 2. To improve fusion performance, we design a learnable mutual correlation factor inside each attention layer to compute associations across different modalities. 3. To overcome the label disagreement issue, we embed the labels from all annotators into a k-dimensional vector and transform the categorical problem into a regression problem; this method provides more accurate annotation information and fully uses the entire dataset. We evaluate the proposed model on two published multimodal emotion recognition datasets: IEMOCAP and MELD. Our model significantly outperforms previous state-of-the-art research by 3.8%-7.5% accuracy, using a more efficient model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Mutual Correlation Attentive Factors in Dyadic Fusion Networks for Speech Emotion Recognition.

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia

Lead the way for us

Journal: Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia	Publication Date: Oct 15, 2019
Citations: 23

Similar Papers

Elastic Graph Transformer Networks for EEG-Based Emotion Recognition
Wei-Bang Jiang ... Wei-Long Zheng
-
Wei-Bang Jiang, et. al.Wei-Bang Jiang ... Wei-Long Zheng
04 Jun 2023
04 Jun 2023

Functional connectivity-enhanced feature-grouped attention network for cross-subject EEG emotion recognition
Wenhui Guo ... Yanjiang Wang
Knowledge-Based Systems | VOL. 283
Wenhui Guo, et. al.Wenhui Guo ... Yanjiang Wang
14 Nov 2023
Knowledge-Based Systems | VOL. 283

Domain Adversarial Network for Cross-Domain Emotion Recognition in Conversation
Hongchao Ma ... Qinglei Zhou
Applied Sciences | VOL. 12
Hongchao Ma, et. al.Hongchao Ma ... Qinglei Zhou
27 May 2022
Applied Sciences | VOL. 12

Multiscale-multichannel feature extraction and classification through one-dimensional convolutional neural network for Speech emotion recognition
Minying Liu ... Shuxin Zhuang
Speech Communication | VOL. 156
Minying Liu, et. al.Minying Liu ... Shuxin Zhuang
22 Nov 2023
Speech Communication | VOL. 156

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mutual Correlation Attentive Factors in Dyadic Fusion Networks for Speech Emotion Recognition.

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... ACM International Conference on Multimedia, with co-located Symposium & Workshops. ACM International Conference on Multimedia