Abstract

Human speech communication will convey semantic information of the uttered word as well as the underlying emotion information of the interlocutor. Emotion identification is important, as it could enhance many applications added-features that can improve human computer interaction aspect. Such improvement surely can help to retain customer satisfaction and loyalty in the long run and serves as an attraction factor for a new customer. Although many researchers have used many approaches to recognize emotion from speech, no one can claim superiority of their findings. This is because different feature extraction methods coupled with various classifiers may produce different performance depending on the data used. This paper presents a comparative analysis of the speech emotion identification system using two different feature extraction methods of Mel Frequency Cepstral Coefficient (MFCC) and Linear Prediction Coefficient (LPC) coupled with Multilayer Perceptron (MLP) classifier. For further exploration, different numbers of MFCC filters are employed to observe the performance of the proposed system. The results indicate that MFCC-40 gives slightly better performance compared to the other MFCC coefficients in the Berlin EMO-DB and NTU_American whereas the MFCC-20 performs well for NTU_Asian. It is also observed that MFCC consistently performed better than LPC in all experiments, which are in-line with many reported findings. Such understanding can be extended to further study speech emotion in order to develop more robust and least error system in the future.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call