Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN.

Lianzhang Zhu,Weishan Zhang,Jiehan Zhou,Dehai Zhao,Leiming Chen

doi:10.3390/s17071694

Lianzhang Zhu, Weishan Zhang + Show 3 more

Open Access

https://doi.org/10.3390/s17071694

Copy DOI

Abstract

Accurate emotion recognition from speech is important for applications like smart health care, smart entertainment, and other smart services. High accuracy emotion recognition from Chinese speech is challenging due to the complexities of the Chinese language. In this paper, we explore how to improve the accuracy of speech emotion recognition, including speech signal feature extraction and emotion classification methods. Five types of features are extracted from a speech sample: mel frequency cepstrum coefficient (MFCC), pitch, formant, short-term zero-crossing rate and short-term energy. By comparing statistical features with deep features extracted by a Deep Belief Network (DBN), we attempt to find the best features to identify the emotion status for speech. We propose a novel classification method that combines DBN and SVM (support vector machine) instead of using only one of them. In addition, a conjugate gradient method is applied to train DBN in order to speed up the training process. Gender-dependent experiments are conducted using an emotional speech database created by the Chinese Academy of Sciences. The results show that DBN features can reflect emotion status better than artificial features, and our new classification approach achieves an accuracy of 95.8%, which is higher than using either DBN or SVM separately. Results also show that DBN can work very well for small training databases if it is properly designed.

Highlights

Emotion is a mixture of people’s physiological response and inner thoughts and plays an important role in rational actions and decision making for human beings
In the pre-emphasis step, a Finite Impulse Response (FIR) filter which is called pre-emphasis filter is used on the speech signal
The speech data was recorded in a pure acoustic environment with 35 db signal-to-noise ratio (SNR), 16000 HZ sampling rate, 16 bit rate, and stored as PCM format

Summary

Introduction

Emotion is a mixture of people’s physiological response and inner thoughts and plays an important role in rational actions and decision making for human beings. Automatic emotion recognition from speech has been an active research topic for applications such as smart health care, smart home, smart entertainment, and many other smart services. Much research has been done towards recognizing human emotions using speech information [1,2,3]. There is little research on emotion recognition in Chinese speech, which is more challenging because of complexities arising from Chinese language’s prosodic characteristics. Two of the most important types of information are verbal content and emotion status, both of which can be distinguished relatively more by humans than by computers. Emotional status are represented by many features, so extracting suitable features from speech signals that can effectively characterize different emotions is crucial. In the pre-emphasis step, a Finite Impulse Response (FIR) filter which is called pre-emphasis filter is used on the speech signal.

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors	Publication Date: Jul 24, 2017
Citations: 101	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Similar Papers

Deep learning and SVM‐based emotion recognition from Chinese speech for smart affective services
Weishan Zhang ... Zhi Chai
Software: Practice and Experience | VOL. 47
Weishan Zhang, et. al.Weishan Zhang ... Zhi Chai
24 Feb 2017
Software: Practice and Experience | VOL. 47

An Improved Speech Emotion Recognition Algorithm Based on Deep Belief Network
Haiqing Zheng ... Yaru Yang
-
Haiqing Zheng, et. al.Haiqing Zheng ... Yaru Yang
01 Jul 2019
01 Jul 2019

Improving the accuracy of speech emotion recognition using acoustic landmarks and Teager energy operator features
Reza Asadi ... Harriet Fell
Journal of the Acoustical Society of America | VOL. 137
Reza Asadi, et. al.Reza Asadi ... Harriet Fell
01 Apr 2015
Journal of the Acoustical Society of America | VOL. 137

IMPROVED SPEAKER-INDEPENDENT EMOTION RECOGNITION FROM SPEECH USING TWO-STAGE FEATURE REDUCTION
Hasrul Mohd Nazid ... Sazali Yaacob
Journal of Information and Communication Technology | VOL. 14
Hasrul Mohd Nazid, et. al.Hasrul Mohd Nazid ... Sazali Yaacob
01 Jan 2015
Journal of Information and Communication Technology | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors