Abstract

The mel cepstral coefficients representing magnitude spectrum and Teager energy operators are often used as features in emotion recognition. The phase spectrum information is generally ignored. In this work an approach is proposed based on the use of group delay function from all pole models (APGD) to represent the phase information for the emotional arousal recognition from speech. The experiments were done on the CRISIS acted speech database with four levels of stress. The results of the arousal recognition system using the APGD features are compared to those using mel-frequency cepstral coefficients (MFCCs) and with Critical Band Based TEO Autocorrelation Envelope features (TEO-CB-Auto-Env) which have been successfully used in the task of emotion and stress detection in the past. The feature extraction is applied on the voiced parts of speech. The combination of APGD, MFCC, and TEO-CB-Auto-Env features has shown the best recognition results confirming the hypothesis that the phase and magnitude spectra contain complementary information and their combination can improve the reliability of the arousal recognition system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call