Performance comparison of speaker and emotion recognition

A. Revathy,V. Mohan,P. Shanmugapriya

doi:10.1109/icscn.2015.7219844

Abstract

This paper discusses the effectiveness on the use of Hidden Markov Model tool kit (HTK) for recognizing speech, speaker and emotion from the emotional speeches using Mel frequency cepstral coefficients (MFCC) as a feature. Emotion independent speech recognition, speaker independent speech recognition, emotion independent speaker recognition and speaker independent emotion recognition systems were proposed and their performances are analyzed. EMO-DB database is used in this work. 80% of the data is used for training and 20% of the data is used for testing. This system provides the average accuracy of 100%, 97%, 90% and 68% for speaker independent speech recognition, emotion independent speech recognition, speaker recognition and emotion recognition respectively. Since HTK based system has given good results for emotional speech recognition, speaker independent and emotion independent emotional speech recognition system is evaluated for noisy test speeches also. Accuracy of the system is improved if the additional preprocessing technique for noise reduction is used prior to conventional preprocessing. Volvo noise, white noise and F16 noise are the noises considered for evaluating the performance of the emotion independent and speaker independent emotional speech recognition system in noisy environment.

Full Text