Pardeep at SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media using Deep Learning

Pardeep Singh,Satish Chand

doi:10.18653/v1/s19-2128

Abstract

The rise of social media has made information exchange faster and easier among the people. However, in recent times, the use of offensive language has seen an upsurge in social media. The main challenge for a service provider is to correctly identify such offensive posts and take necessary action to monitor and control their spread. In this work, we try to address this problem by using sophisticated deep learning techniques like LSTM, Bidirectional LSTM and Bidirectional GRU. Our proposed approach solves 3 different Sub-tasks provided in the SemEval-2019 task 6 which incorporates identification of offensive tweets as well as their categorization. We obtain significantly better results in the leader-board for Sub-task B and decent results for Sub-task A and Subtask C validating the fact that the proposed models can be used for automating the offensive post-detection task in social media.

Highlights

Social media has revolutionized the way of communication among the people
First we convert the text into vector representations with the help of GloVe (Pennington et al, 2014) word level embeddings and use these representations as an input to the deep learning models described in the subsequent sections for classification tasks
It is evident from this Table that the deep learning models like Long Short Term Memory (LSTM), Bidirectional LSTM and Bidirectional Gated Recurrent Unit (GRU) with GloVe word embeddings outperformed TF-IDF based machine learning algorithms

Summary

Introduction

Social media has revolutionized the way of communication among the people. It is an instant communication medium which connects people all over the world and shares their views. There is an utmost need to develop a system which automatically identifies and categorizes the offensive language in social networks. To tackle these issues SemEval-2019 (Zampieri et al, 2019b) aimed exactly at that need and organized a task in identifying and categorizing offensive language in social media This task is divided into three Sub-tasks. Our approach for the SemEval-2019 task 6 (identifying and categorizing offensive language in social media) comprises of deep learning models: Bidirectional LSTM, Bidirectional GRU and standard LSTM. These are popularly used deep learning sequence models applied in many text classification tasks. This paper presents the description of our approaches and results for SemEval-2019 task 6

Methods

Results

Conclusion