Abstract

As a multimodal form of hate speech on social media, hateful memes are more aggressive and cryptic threats to the real life of humans. Automatic detection of hateful memes is crucial, but the images and texts in most memes are only weakly consistent or even irrelevant. Although existing works have achieved the initial goal of detecting hateful memes with pre-trained models, they are limited to monolithic inference methods while ignoring the semantic differences between multimodal representations. To strengthen the comprehension and reasoning of the hidden meaning behind the memes by combining real-world knowledge, we propose an enhanced multimodal fusion framework with congruent reinforced perceptron for hateful memes detection. Inspired by the human cognitive mechanism, we first divide the extracted multisource representations into main semantics and auxiliary contexts based on their strength and relevance, and then precode them into lightly correlated embeddings with unified spatial dimensions via a novel prefix uniform layer, respectively. To jointly learn the intrinsic correlation between primary and secondary semantics, a congruent reinforced perceptron with brain-like perceptual integration is designed to seamlessly fuse multimodal representations in a shared latent space while maintaining the feature integrity in the sub-fusion space, thereby implicitly reasoning about the subtle metaphors behind the memes. Extensive experiments on four benchmark datasets fully demonstrate the effectiveness and superiority of our architecture compared with previous state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.