This article offers a systematic review of the evolution in sentiment analysis techniques, moving from unimodal to multimodal to multi-occasion methodologies, with an emphasis on the integration and application of deep learning in sentiment analysis. Firstly, the paper presents the theoretical foundation of sentiment analysis, including the definition and classification of affect and emotion. It then delves into the pivotal technologies used in unimodal sentiment analysis, specifically within the domains of text, speech, and image analysis, examining feature extraction, representation, and classification models. Subsequently, the focus shifts to multimodal sentiment analysis. The paper offers a survey of widely utilized multimodal sentiment datasets, feature representation and fusion techniques, as well as deep learning-based multimodal sentiment analysis models such as attention networks and graph neural networks. It further addresses the application of these multimodal sentiment analysis techniques in social media, product reviews, and public opinion monitoring. Lastly, the paper underscores that challenges persist in the area of multimodal sentiment fusion, including data imbalance and disparities in feature expression. It calls for further research into cross-modal feature expression, dataset augmentation, and explainable modeling to enhance the performance of complex sentiment analysis across multiple occasions.
Read full abstract