Abstract

Speech Emotion Recognition is an emerging research field and is expected to benefit many application domains by providing effective Human Computer Interface. Researchers are extensively working towards decoding of human emotions through speech signal in order to achieve effective interface and smart response by computers. The perfection of speech emotion recognition greatly depends upon the types of features used and also on the classifier employed for recognition. The contribution of this paper is to evaluate twelve different Long Short Term Memory (LSTM) networks models as classifier based on Mel-Frequency Cepstrum Coefficients (MFCC) feature. The paper presents performance evaluation in terms of important parameters such as: precision, recall, F-measure and accuracy for four emotions like happy, neutral, sad and angry using the emotional speech databases namely Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). The measurement accuracy obtained is 89% which is 9.5% more than reported in recent literature. The suitable LSTM model is further successfully implemented on Raspberry PI board creating standalone Speech Emotion Recognition system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call