Sentiment analysis is a key component of many social media analysis projects. Additionally, prior research has concentrated on a single modality in particular, such as text descriptions for visual information. In contrast to standard image databases, social images frequently connect to one another, making sentiment analysis challenging. The majority of methods now in use consider different images individually, rendering them useless for interrelated images. We proposed a hybrid Arithmetic Optimization Algorithm- Hunger Games Search (AOA-HGS)-optimized Ensemble Multi-scale Residual Attention Network (EMRA-Net) technique in this paper to explore the modal correlations including texts, audio, social links, and video for more effective multimodal sentiment analysis. The hybrid AOA-HGS technique learns complementary and comprehensive features. The EMRA-Net uses two segments, including Ensemble Attention CNN (EA-CNN) and Three-scale Residual Attention Convolutional Neural Network (TRA-CNN), to analyze the multimodal sentiments. The loss of spatial domain image texture features can be reduced by adding the Wavelet transform to TRA-CNN. The feature-level fusion technique known as EA-CNN is used to combine visual, audio, and textual information. The proposed method performs significantly better than the existing multimodel sentimental analysis techniques of HALCB, HDF, and MMLatch when evaluated using the Multimodal Emotion Lines Dataset (MELD) and EmoryNLP datasets. Also, even though the size of the training set varies, the proposed method outperformed other techniques in terms of recall, accuracy, F score, and precision and takes less time to compute in both datasets.
Read full abstract