Abstract
In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature.
Highlights
Speech utterances of an individual can provide information about his/her health state, emotion, language employed and gender
Improved speaker-independent multi-class emotion recognition can provide a better communication between human and machine
We have investigated the effectiveness of SD/GD SI
Summary
Speech utterances of an individual can provide information about his/her health state, emotion, language employed and gender. Speech is the one of the most natural form of communication between the individuals. Understanding of an individual’s emotion can be useful for applications like web movies, electronic tutoring applications, in-car board system, diagnostic tool for therapists and call-center applications. Most of the existing emotional speech database contains three types of emotional speech recordings such as simulated, elicited and natural. PLOS ONE | DOI:10.1371/journal.pone.0120344 March 23, 2015
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have