Multimodal sentiment analysis is one of the important research areas in the field of artificial intelligence today. Multimodal sentiment analysis is to extract features from various human modalities such as facial expressions, body movements, and voice information, perform modal fusion, and finally classify and predict emotions. This technology can be used in multiple scenarios such as stock prediction, product analysis, movie box office prediction, etc., especially psychological state analysis, and has important research significance. This paper introduces two important datasets in multimodal sentiment analysis, namely CMU-MOSEI and IEMOCAP. It also introduces the feature-level fusion, model-level fusion, decision-level fusion and other fusion methods in multimodal fusion methods, and also introduces the semantic feature fusion neural network and sentiment word perception fusion network in multimodal sentiment analysis related models. Finally, the application of multimodal sentiment analysis models in depression and other related mental illnesses and the challenges of multimodal sentiment analysis models in the future are introduced. This paper hopes that the above research will be helpful for multimodal sentiment analysis.
Read full abstract