Abstract

Memes are a form of friendly exchange between one another and can guide one smoothly over a conflict or come to a meaningful conclusion. Detection of offensive and hate speech-related content in social media memes has been investigated in a single modality only, i.e. considering only text or only image. However, memes are content plus images. Consideration of both modalities to identify offensive memes is vital. Thus, we made a model considering this multimodal nature. The training dataset available from the Facebook challenge had pre-extracted text from a meme. However, testing an original meme, the text had to be extracted with the help of Optical Character Recognition (OCR) technology. So if the text on the meme will be not distorted rather straight, the OCR technology will give an accurate extraction of text. The text had separately been passed through a text module where different techniques like Convolutional Neural Network (CNN), Bidirectional Long Short-Term Memory (BiLSTM), and stacked LSTM have been used for detection. The image has also been passed to an image module where a CNN model, i.e. VGG16 has been used for detection. This is followed by the fusion of results from both modules via the basic concatenation technique. All of this procedure had been performed using 2 different word embeddings, i.e. GloVe and Bidirectional Encoder Representations from Transformers (BERT). The best accuracy obtained was 0.485 using BiLSTM + VGG16. Comparing the results between BERT and GloVe, GloVe beat BERT with a very minor margin, though with an even larger dataset BERT could overtake GloVe.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call