Abstract

Due to the widespread adoption of social networks, image-text comments have become a prevalent mode of emotional expression compared to traditional text descriptions. However, there are currently two major challenges. The first is the question of how to extract rich representations effectively from both text and images, and the second is the question of how to extract cross-modal shared emotion features. This study proposes a multimodal sentiment analysis method based on a deep feature interaction network (DFINet). It leverages word-to-word graphs and deep attention interaction networks (DAIN) to learn text representations effectively from multiple subspaces. Additionally, it introduces a cross-modal attention interaction network to extract cross-modal shared emotion features efficiently. This approach helps alleviate the difficulties associated with acquiring image-text features and representing cross-modal shared emotion features. Experimental results on the Yelp dataset demonstrate the effectiveness of the DFINet method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.