Abstract

The objective of speech emotion recognition (SER) is to enhance man-machine interface. It can also be used to cover the physiological state of a person in critical situations. In recent time, speech emotion recognition also finds its operations in medicine and forensics. A new feature extraction technique using Teager energy operator (TEO) is proposed for the detection of stressed emotions as Teager energy-autocorrelation envelope (TEO-Auto-Env). TEO is basically designed for increasing the energies of the stressed speech signals whose energies are reduced during the speech production process and hence used in this analysis. A stressed speech emotion recognition (SSER) system is developed using TEO-Auto-Env and spectral feature combination for detecting the emotions. The spectral features considered are Mel-frequency cepstral coefficients (MFCC), linear prediction cepstral coefficients (LPCC), and relative spectra-perceptual linear prediction (RASTA-PLP). EMO-DB (German), EMOVO (Italian), IITKGP (Telugu), and EMA (English) databases are used in this analysis. The classification of the emotions is carried out using the k-nearest neighborhood (k-NN) classifier for gender-dependent (GD) and speaker-independent (SI) cases. The proposed SSER system provides improved accuracy compared to the existing ones. Average recall is used for performance evaluation. The highest classification accuracy is achieved using the feature combination of TEO-Auto-Env, MFCC, and LPCC features with 91.4% (SI), 91.4% (GD-male), and 93.1%(GD-female) for EMO-DB; 68.5% (SI), 68.5% (GD-male), and 74.6% (GD-female) for EMOVO; 90.6%(SI), 91% (GD-male), and 92.3% (GD-female) for EMA; and 95.1% (GD-female) for IITKGP female database.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call