Abstract

Speech Emotion Recognition (SER) has obtained growing attention during the past years. For this purpose, various methods have been proposed. Feature extraction is the major part of SER methods and aims to attain effective emotional features from speech signal. One of the most important features in speech processing task is Mel Frequency Cepstral Coefficients (MFCC). The vocal production mechanisms of speakers at different emotional states can improve the discrimination abilities of the aforementioned features for SER task. This work aims to propose a novel feature extraction scheme for SER task that integrates this particular information through the decomposition of emotional speech spectra and providing an improved spectral representation of various emotions. By employing this scheme, two novel procedures are represented. In the first procedures, cepstral-like features are obtained by a filter bank which is computed by Non-negative Matrix Factorization (NMF) technique on emotional speech spectra. In the second procedures, the activation coefficients of NMF technique which are achieved by decomposition of speech spectrums, are considered as the new features. Finally, to increase the discrimination abilities of features among emotion classes, each of the feature vectors is normalized to its mean value. According to experiments on Emo-DB database, fusion of the proposed features with MFCCs outperforms the performance of an SER system compared with conventional MFCC as the baseline or the simple unsupervised NMF-based features derived from the speech spectra.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call