Novel feature representation using single frequency filtering and nonlinear energy operator for speech emotion recognition

Ramakrishna Thirumuru,Krishna Gurugubelli,Anil Kumar Vuppala

doi:10.1016/j.dsp.2021.103293

Ramakrishna Thirumuru, Krishna Gurugubelli + Show 1 more

https://doi.org/10.1016/j.dsp.2021.103293

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

In this paper, the intrinsic characteristics of speech modulations are estimated to propose the instant modulation spectral features for efficient emotion recognition. This feature representation is based on single frequency filtering (SFF) technique and higher order nonlinear energy operator. The speech signal is decomposed into frequency sub-bands using SFF, and associated nonlinear energies are estimated with higher order nonlinear energy operator. Then, the feature vector is realized using cepstral analysis. The high-resolution property of SFF technique is exploited to extract the amplitude envelope of the speech signal at a selected frequency with good time-frequency resolution. The fourth order nonlinear energy operator provides noise robustness in estimating the modulation components. The proposed feature set is tested for the emotion recognition task using the i-vector model with the probabilistic linear discriminant scoring scheme, support vector machine and random forest classifiers. The results demonstrate that the performance of this feature representation is better than the widely used spectral and prosody features, achieving detection accuracy of 85.75%, 59.88%, and 65.78% on three emotional databases, EMODB, FAU-AIBO, and IEMOCAP, respectively. Further, the proposed features are found to be robust in the presence of additive white Gaussian and vehicular noises.

Full Text