Research on emotion recognition is an interesting area because of its wide-ranging applications in education, marketing, and medical fields. This study proposes a multi-branch convolutional neural network model based on cross-attention mechanism (MCNN-CA) for accurate recognition of different emotions. The proposed model provides automated extraction of relevant features from multimodal data and fusion of feature maps from diverse sources as modules for the subsequent emotion recognition. In the feature extraction stage, various convolutional neural networks were designed to extract critical information from multiple dimensional features. The feature fusion module was used to enhance the inter-correlation between features based on channel-efficient attention mechanism. This innovation proves effective in fusing distinctive features within a single mode and across different modes. The model was assessed based on EEG emotion recognition experiments on the SEED and SEED-IV datasets. Furthermore, the efficiency of the proposed model was evaluated via multimodal emotion experiments using EEG and text data from the ZuCo dataset. Comparative analysis alongside contemporary studies shows that our model excels in terms of accuracy, precision, recall, and F1-score.
Read full abstract