Abstract

Different parts of an image have a strong or weak guiding effect on emotions. The key to emotion recognition of images is to fully exploit the regions associated with emotions. Therefore, this paper proposes a visual sentiment classification model with two branches based on visual transformer, termed as Enhanced Effective Region and Context-Aware Vision Transformers (EERCA-ViT). This model includes a primary branch and an auxiliary branch. The primary branch simulates interdependencies between patches by squeezing and stimulating patches (P-SE), thereby highlighting effective region features in the global feature. The auxiliary branch removes feature patches that have been tagged by the primary branch through the context-aware module (CAM), forcing the network to discover different sentiment discriminative regions. At the same time, a two-part loss function is constructed to improve the robustness of the model. Finally, extensive experiments on six benchmark datasets show that the proposed method outperforms the state-of-the-art image sentiment analysis methods. Furthermore, the effectiveness of the different modules of the framework (P-SE and CAM) has been well demonstrated through extensive comparative experiments.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.