GM-TCNet: Gated Multi-scale Temporal Convolutional Network using Emotion Causality for Speech Emotion Recognition

Jia-Xin Ye,Xin-Cheng Wen,Xuan-Ze Wang,Yong Xu,Yan Luo,Chang-Li Wu,Li-Yan Chen,Kun-Hong Liu

doi:10.1016/j.specom.2022.07.005

Jia-Xin Ye, Xin-Cheng Wen + Show 6 more

Open Access

https://doi.org/10.1016/j.specom.2022.07.005

Copy DOI

Abstract

In human-computer interaction, Speech Emotion Recognition (SER) plays an essential role in understanding the user's intent and improving the interactive experience. While similar sentimental speeches own diverse speaker characteristics but share common antecedents and consequences, an essential challenge for SER is how to produce robust and discriminative representations through causality between speech emotions. In this paper, we propose a Gated Multi-scale Temporal Convolutional Network (GM-TCNet) to construct a novel emotional causality repre- sentation learning component with a multi-scale receptive field. GM-TCNet deploys a novel emotional causality representation learning component to capture the dynamics of emotion across the time domain, constructed with dilated causal convolutions layer and gating mechanism. Besides, it utilizes skip connection fusing high-level fea- tures from different Gated Convolution Blocks (GCB) to capture abundant and subtle emotion changes in human speech. GM-TCNet first uses a single type of feature, Mel-Frequency Cepstral Coefficients (MFCC), as inputs and then passes them through the Gated Temporal Convolutional Module (GTCM) to generate the high-level fea- tures. Finally, the features are fed to the emotion classifier to accomplish the SER task. The experimental results show that our model maintains the highest performance in most cases, with +0.90% to +18.50% and +0.55% to +20.15% average relative improvement on the weighted average recall and unweighted average recall compared to state-of-the-art techniques. The source code is available at: https://github.com/Jiaxin-Ye/GM-TCNet for SER.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

GM-TCNet: Gated Multi-scale Temporal Convolutional Network using Emotion Causality for Speech Emotion Recognition

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Sep 15, 2022
Citations: 25

Similar Papers

Speech emotion recognition based on convolutional neural network
Chen Jie
-
Chen JieChen Jie
01 Dec 2021
01 Dec 2021

In-depth investigation of speech emotion recognition studies from past to present –The importance of emotion recognition from speech signal for AI–
Yeşim Ülgen Sönmez ... Asaf Varol
Intelligent Systems with Applications | VOL. 22
Yeşim Ülgen Sönmez, et. al.Yeşim Ülgen Sönmez ... Asaf Varol
11 Mar 2024
Intelligent Systems with Applications | VOL. 22

Egyptian Arabic speech emotion recognition using prosodic, spectral and wavelet features
Lamiaa Abdel-Hamid
Speech Communication | VOL. 122
Lamiaa Abdel-HamidLamiaa Abdel-Hamid
22 May 2020
Speech Communication | VOL. 122

Speech Emotion Recognition Using Deep Feedforward Neural Network
Muhammad Fahreza Alghifari ... Teddy Surya Gunawan
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 10
Muhammad Fahreza Alghifari, et. al.Muhammad Fahreza Alghifari ... Teddy Surya Gunawan
01 May 2018
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

GM-TCNet: Gated Multi-scale Temporal Convolutional Network using Emotion Causality for Speech Emotion Recognition

Abstract

Talk to us

Similar Papers

More From: Speech Communication