Abstract
AbstractSpeech emotion recognition is an emerging research field and is expected to benefit many application domains by providing effective human–computer interface. Researchers are extensively working toward decoding of human emotions through speech signal in order to achieve effective interface and smart response by computers. The perfection of speech emotion recognition greatly depends upon the types of feature used and also on the classifier employed for recognition. The contribution of this paper is to evaluate twelve different long short-term memory (LSTM) network models as classifier based on Mel frequency cepstrum coefficient (MFCC) feature. The paper presents performance evaluation in terms of important parameters such as: precision, recall, F-measure and accuracy for four emotions like happy, neutral, sad and angry using the emotional speech databases, namely Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). The measurement accuracy obtained is 89% which is 9.5% more than reported in recent literature. The suitable LSTM model is further successfully implemented on Raspberry Pi board creating stand-alone speech emotion recognition system.KeywordsHuman–computer interactionSERMFCCLSTMSpeech emotion recognition
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have