Abstract

Recently, the research attention of emotional speech signals has been boosted in human machine interfaces due to the availability of high computation capability. Based on different feature extraction on audio data, it is possible to achieve good accuracy on speech emotion recognition, thus feature extraction plays an important role in speech emotion recognition. However, there are still dilemmas in speech emotion recognition, such as the heavy computation burden due to the high data dimension. In this paper, we propose a new learning scheme with mean Fourier parameters using the perceptual content of voice quality for speaker-independent speech emotion recognition. The dimension of the acoustic feature is greatly reduced and the computational performance is improved with big extent. Two speech databases (German emotional corpus, Interactive Emotional Dyadic Motion Capture) are used in the experiment, and the combination of different features with different classifiers are implemented in the recognition for performance comparison. The recognition results show that the proposed scheme with mean Fourier Parameters combined with the Random Forest classifier is efficient in classifying various emotional states in speech signals and is excellent than other features and classifiers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.