Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching

Shiqing Zhang,Wen Gao,Shiliang Zhang,Tiejun Huang

doi:10.1109/tmm.2017.2766843

Abstract

Speech emotion recognition is challenging because of the affective gap between the subjective emotions and low-level features. Integrating multilevel feature learning and model training, deep convolutional neural networks (DCNN) has exhibited remarkable success in bridging the semantic gap in visual tasks like image classification, object detection. This paper explores how to utilize a DCNN to bridge the affective gap in speech signals. To this end, we first extract three channels of log Mel-spectrograms (static, delta, and delta delta) similar to the red, green, blue (RGB) image representation as the DCNN input. Then, the AlexNet DCNN model pretrained on the large ImageNet dataset is employed to learn high-level feature representations on each segment divided from an utterance. The learned segment-level features are aggregated by a discriminant temporal pyramid matching (DTPM) strategy. DTPM combines temporal pyramid matching and optimal Lp-norm pooling to form a global utterance-level feature representation, followed by the linear support vector machines for emotion classification. Experimental results on four public datasets, that is, EMO-DB, RML, eNTERFACE05, and BAUM-1s, show the promising performance of our DCNN model and the DTPM strategy. Another interesting finding is that the DCNN model pretrained for image applications performs reasonably good in affective speech feature extraction. Further fine tuning on the target emotional speech datasets substantially promotes recognition performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia

Lead the way for us

Journal: IEEE Transactions on Multimedia	Publication Date: Jun 1, 2018
Citations: 399

Similar Papers

Abstract 1394: Diagnosis of thyroid cancer using deep convolutional neural network models applied to sonographic images from clinical ultrasound exams
Xiangchun Li ...
Cancer Research | VOL. 79
Xiangchun Li, et. al.Xiangchun Li ...
01 Jul 2019
Cancer Research | VOL. 79

A comparative evaluation of deep convolutional neural network and deep neural network-based land use/land cover classifications of mining regions using fused multi-sensor satellite data
Ajay Kumar ... Amit Kumar Gorai
Advances in Space Research | VOL. 72
Ajay Kumar, et. al.Ajay Kumar ... Amit Kumar Gorai
04 Sep 2023
Advances in Space Research | VOL. 72

Deciphering deep ensembles for lung nodule analysis
Ravi K Samala ... Maciej A Mazurowski
-
Ravi K Samala, et. al.Ravi K Samala ... Maciej A Mazurowski
04 Apr 2022
04 Apr 2022

Comparing the Performance of Neural Network and Deep Convolutional Neural Network in Estimating Soil Moisture from Satellite Observations
Lingling Ge ... Qingshan Liu
Remote Sensing | VOL. 10
Lingling Ge, et. al.Lingling Ge ... Qingshan Liu
21 Aug 2018
Remote Sensing | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia