Abstract
Currently, there is a great deal of interest in multimodal aspect-level sentiment classification using both textual and visual information, which changes the traditional use of only single-modal to identify sentiment polarity. Considering that existing methods could be strengthened in terms of classification accuracy, we conducted a study on aspect-level multimodal sentiment classification with the aim of exploring the interaction between textual and visual features. Specifically, we construct a multimodal aspect-level sentiment classification framework with multi-image gate and fusion networks called MFSC. MFSC consists of four parts, i.e., text feature extraction, visual feature extraction, text feature enhancement, and multi-feature fusion. Firstly, a bidirectional long short-term memory network is adopted to extract the initial text feature. Based on this, a text feature enhancement strategy is designed, which uses text memory network and adaptive weights to extract the final text features. Meanwhile, a multi-image gate method is proposed for fusing features from multiple images and filtering out irrelevant noise. Finally, a text-visual feature fusion method based on an attention mechanism is proposed to better improve the classification performance by capturing the association between text and images. Experimental results show that MFSC has advantages in classification accuracy and macro-F1.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.