Abstract
Emotion recognition is a very important technique for ultimate interactions between human beings and artificial intelligence systems. For effective emotion recognition in a continuous-time domain, this article presents a multimodal fusion network which integrates video modality and electroencephalogram (EEG) modality networks. To calculate the attention weights of facial video features and the corresponding EEG features in fusion, a multimodal attention network, that is utilizing bilinear pooling based on low-rank decomposition, is proposed. Finally, continuous domain valence values are computed by using two modality network outputs and attention weights. Experimental results show that the proposed fusion network provides an improved performance of about 6.9% over the video modality network for the MAHNOB human computer interface (MAHNOB-HCI) dataset. Also, we achieved the performance improvement even for our proprietary dataset.
Highlights
Recognition of human emotions is a key technology for ultimate human–robot interaction (HRI)
Various emotion recognition mechanisms based on convolutional neural network (CNN) which are trained in an end-to-end manner have been developed and showed reliable performance [3], [4]
If we can analyze the characteristics of two modalities and calculate their weights, we can achieve a synergy of video modality and EEG modality for emotion recognition
Summary
Recognition of human emotions is a key technology for ultimate human–robot interaction (HRI). Conventional emotion recognition algorithms distinguished emotion categories by detecting changes in facial expressions [1], [2]. Various emotion recognition mechanisms based on convolutional neural network (CNN) which are trained in an end-to-end manner have been developed and showed reliable performance [3], [4]. There were many attempts to recognize human emotions from tone information of voice signals [5]. Since the voice information is temporally sparse, those voice tone-based emotion recognition schemes have a fundamental limitation in extracting consecutive emotions. Several emotion recognition algorithms using EEG, which is an electrical bio-signal generated in the human brain have been reported [6]–[8].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.