Abstract

In recent years, human-computer interaction (HCI) systems have become increasingly popular. Some of these systems demand particular approaches for discriminating actual emotions through the use of better multimodal methods. In this work, a deep canonical correlation analysis (DCCA) based multimodal emotion recognition method is presented through the fusion of electroencephalography (EEG) and facial video clips. A two-stage framework is implemented, where the first stage extracts relevant features for emotion recognition using a single modality, while the second stage merges the highly correlated features from the two modalities and performs classification. Convolutional neural network (CNN) based Resnet50 and 1D-CNN (1-Dimensional CNN) have been utilized to extract features from facial video clips and EEG modalities, respectively. A DCCA-based approach was used to fuse highly correlated features, and three basic human emotion categories (happy, neutral, and sad) were classified using the SoftMax classifier. The proposed approach was investigated based on the publicly available datasets called MAHNOB-HCI and DEAP. Experimental results revealed an average accuracy of 93.86% and 91.54% on the MAHNOB-HCI and DEAP datasets, respectively. The competitiveness of the proposed framework and the justification for exclusivity in achieving this accuracy were evaluated by comparison with existing work.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call