Abstract
Recent years, people tend to use text coupling with images to express and share their views on social media. At present, most multi-modal research work only adopts a simple stitching approach for different modal data fusion, failing to consider the correlation and complementarity between different modalities. Fully exploiting the correlation and complementarity of text and images is essential for user sentiment analysis. In this paper, we propose a sentiment analysis model based on adaptive multi-modal feature fusion strategy. This model can effectively explore the bi-directional association relationship between text and picture and fuse each modal feature through the multi-modal deep adaptive feature fusion model. In addition, we add Label Smoothing to the loss function to enhance the generalization ability of the model. The experimental results on a public dataset show that the proposed model achieves the best performance compared to the baseline model, with a 2.0% improvement in accuracy. Moreover, to demonstrate the generalization capability of the proposed adaptive fusion feature module, we test it on other tasks Multi-Modal Fake News Detection achieving accuracy with 2.5% improvement.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.