Abstract
Multimodal emotion recognition is an emerging interdisciplinary field of research in the area of affective computing and sentiment analysis. It aims at exploiting the information carried by signals of different nature to make emotion recognition systems more accurate. This is achieved by employing a powerful multimodal fusion method. In this study, a hybrid multimodal data fusion method is proposed in which the audio and visual modalities are fused using a latent space linear map and then, their projected features into the cross-modal space are fused with the textual modality using a Dempster-Shafer (DS) theory-based evidential fusion method. The evaluation of the proposed method on the videos of the DEAP dataset shows its superiority over both decision-level and non-latent space fusion methods. Furthermore, the results reveal that employing Marginal Fisher Analysis (MFA) for feature-level audio-visual fusion results in higher improvement in comparison to cross-modal factor analysis (CFA) and canonical correlation analysis (CCA). Also, the implementation results show that exploiting textual users' comments with the audiovisual content of movies improves the performance of the system.
Highlights
Emotion recognition is the process of specifying the affective state of people
It plays an important role in affective computing and human-computer interaction (HCI) applications [1]
We propose a hybrid fusion method for multimodal emotion recognition which benefits from both featureand decision-level fusion
Summary
Emotion recognition is the process of specifying the affective state of people. It plays an important role in affective computing and human-computer interaction (HCI) applications [1]. Different applications benefit from emotion recognition, including video games [3], military healthcare [4], tutoring systems [5], predicting customer satisfaction [6], and Twitter analysis [7]. Multimodal emotion recognition has attracted an increasing attention of researchers as it can overcome the limitations of monomodal systems [8]–[10]. Multimodal emotion recognition fuses complementary information of different modalities at different fusion levels. These levels can be classified into two categories: prior to matching and
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.