MVIB-DVA: Learning minimum sufficient multi-feature speech emotion embeddings under dual-view aware

Guoyan Li,Junjie Hou,Yi Liu,Jianguo Wei

doi:10.1016/j.eswa.2023.123110

Abstract

Speech emotion recognition (SER) is a crucial topic in human–computer interaction. However, there are still many challenges to extracting emotional embeddings. Emotional embeddings extracted by network models often contain noise and incomplete emotional information. To meet these challenges, this study developed an innovative model (MVIB-DVA) composed of a multi-feature variational information bottleneck (MVIB) based on the information bottleneck (IB) principle and a dual-view aware module (DVAM) with an attention mechanism. MVIB employs the IB principle as the driving model and utilizes learned minimal sufficient single-feature emotional embeddings as auxiliary information. The aims are to capture unique emotional information in individual features and complementary information between different types of features as well as reduce noise and represent rich emotional information with fewer parameters. DVAM proposes (1) a frequency-domain statistical aware module (FDSAM) in the frequency domain that emphasizes the frequency that best reflects emotional information and (2) a frame aware module (FAM) in the time domain that focuses on the frames that contribute to the most to the final emotion recognition results. A separate channel extracts details ignored in the frequency and time domain views, extracting more comprehensive emotional information. The experimental results show that our method performs excellently in recognizing speech emotions. MVIB-DVA achieved weighted accuracy (WA) of 74.05% and unweighted accuracy (UA) of 75.67% on the IEMOCAP dataset. Similarly, on the RAVDESS dataset, MVIB-DVA attained WA of 86.66% and UA of 86.51%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MVIB-DVA: Learning minimum sufficient multi-feature speech emotion embeddings under dual-view aware

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications

Lead the way for us

Journal: Expert Systems with Applications	Publication Date: Jan 5, 2024
Citations: 2

Similar Papers

Spatio-temporal representation learning enhanced speech emotion recognition with multi-head attention mechanisms
Zengzhao Chen ... Chuan Liu
Knowledge-Based Systems | VOL. 281
Zengzhao Chen, et. al.Zengzhao Chen ... Chuan Liu
21 Oct 2023
Knowledge-Based Systems | VOL. 281

Exploring Complementary Features in Multi-Modal Speech Emotion Recognition
Suzhen Wang ... Yifeng Ma
-
Suzhen Wang, et. al.Suzhen Wang ... Yifeng Ma
04 Jun 2023
04 Jun 2023

Recognition of Emotion in Speech-related Audio Files with LSTM-Transformer
Felicia Andayani ... Mark Teekit Tsun
-
Felicia Andayani, et. al.Felicia Andayani ... Mark Teekit Tsun
09 Mar 2022
09 Mar 2022

In-depth investigation of speech emotion recognition studies from past to present –The importance of emotion recognition from speech signal for AI–
Yeşim Ülgen Sönmez ... Asaf Varol
Intelligent Systems with Applications | VOL. 22
Yeşim Ülgen Sönmez, et. al.Yeşim Ülgen Sönmez ... Asaf Varol
11 Mar 2024
Intelligent Systems with Applications | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MVIB-DVA: Learning minimum sufficient multi-feature speech emotion embeddings under dual-view aware

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications