Abstract
This paper explores the Linear Prediction (LP) residual of speech signal for characterizing the basic emotions. The emotions used in this study are anger, compassion, disgust, fear, happy, neutral, sarcastic and surprise. LP residual is derived by inverse filtering of the speech signal, and the process is known as LP analysis. LP residual mainly contains higher order relations among the samples. For capturing the emotion specific information from these higher order relations, autoassociative neural network (AANN) and Gaussian mixture models (GMM) are used. The decrease in the error during training phase of the AANN's and the emotion recognition performance of the models, demonstrate that the excitation source component of speech contains emotion-specific information and is indeed being captured by the AANN and GMM models. IITKGP-Simulated Emotion Speech Corpus (IITKGP-SESC) is used as a database, for characterization and classification of emotions. The emotion recognition performance is observed to be about 56 %.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have