Abstract

Automatic recognition of human emotional states has attracted many researchers' attention in Human-Computer Interactions and emotional brain-computer interface recently. However, the accuracy of emotion recognition is not satisfying. Considering the advantage of information supplement based on deep learning of multi-modal signals related to emotion, this study proposed a novel emotion recognition architecture to fuse emotional features from brain electroencephalography (EEG) signal and the corresponding audio signal in emotion recognition on DEAP dataset. We used convolutional neural network (CNN) to extract EEG features and bidirectional long short term memory (BiLSTM) neural networks to extract audio features. After that, we combine the multi-modal features into a deep learning architecture to recognize arousal and valence levels. Results showed an improved accuracy compared with previous studies that merely used the EEG signals in both arousal level and valence level, which suggests the effectiveness of our proposed multi-modal fused emotion recognition model. In future work, multi-modal data from nature interaction scenes will be collected and inputted into this architecture to further validate the effectiveness of the method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call