Optimized Multimodal Emotional Recognition Using Long Short-Term Memory

doi:10.46632/cellrm/3/1/3

Abstract

The aim of this project is to research and classification on human emotions. A new method for the recognition of speech signals has been introduced. It’s called LSTM (Long-Short Term Memory). It is a type of Recurrent neural network. RNN is used for analyzing sequential data, hence it is useful for speech signal recognition. Several Datasets were found across the internet for this project. Ex: TESS (Toronto Emotional Speech Set), RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song), SAVEE (Surrey Audio-Visual Expressed Emotion), CREMA-D (Crowd- Sourced Emotional Multimodal Actors Dataset). The Main Dataset used in this project is TESS (Toronto Emotional Speech Set) Dataset and Mel Frequency Cepstral Coefficient (MFCC) is Used for Feature extraction.

Full Text