Abstract
This paper describes a new posed multimodal emotional dataset and compares human emotion classification based on four different modalities - audio, video, electromyography (EMG), and electroencephalography (EEG). The results are reported with several baseline approaches using various feature extraction techniques and machine-learning algorithms. First, we collected a dataset from 11 human subjects expressing six basic emotions and one neutral emotion. We then extracted features from each modality using principal component analysis, autoencoder, convolution network, and mel-frequency cepstral coefficient (MFCC), some unique to individual modalities. A number of baseline models have been applied to compare the classification performance in emotion recognition, including k-nearest neighbors (KNN), support vector machines (SVM), random forest, multilayer perceptron (MLP), long short-term memory (LSTM) model, and convolutional neural network (CNN). Our results show that bootstrapping the biosensor signals (i.e., EMG and EEG) can greatly increase emotion classification performance by reducing noise. In contrast, the best classification results were obtained by a traditional KNN, whereas audio and image sequences of human emotions could be better classified using LSTM.
Highlights
In daily life, emotions are abound and there are countless reasons for determining someone’s emotional state, including for better communication and work efficiency
The VGG16 feature with the long short-term memory (LSTM) model achieves a mean accuracy of 67.20%, which is over 10% better than autoencoder features and 20% better than the Principal component analysis (PCA) features
We examined various feature extraction techniques (MFCC, PCA, autoencoder and pre-trained convolutional neural network (CNN)) and machine learning models (KNN, support vector machines (SVM), Random Forest, multilayer perceptron (MLP), CNN, LSTM) for each modality
Summary
Emotions are abound and there are countless reasons for determining someone’s emotional state, including for better communication and work efficiency. In the product development process, product features and design can be determined to be more suitable for users by analyzing the user’s emotional states during their user experience. Caregivers can provide better care to patients if the patient’s emotional states in different situations are known. Many emotion classification studies use deep learning methods in combination with state-of-the-art statistics to optimize the accuracy of detecting the emotion and attempt to integrate multiple modalities for better accuracy. With increasing attention in emotion recognition, which will be detailed in the Related Work section, many emotional datasets have been collected, including both nonphysiological signals (e.g., facial expressions and speech) and physiological signals (e.g., electroencephalogram (EEG), electromyogram (EMG), electrooculogram (EOG))
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.