Auto-Off ID: Automatic Detection of Offensive Language in Social Media

R Geetha,Bharathi M Janani,Chaluvadi Jwala Sowmika,S Karthika

doi:10.1088/1742-6596/1911/1/012012

Abstract

As the popularity of social media grows, computer-mediated anonymity allows users to engage in activities that they would not do in real life. This makes users vulnerable to abuse through Internet platforms. Due to the enormous number of social media data, it is not possible to manually filter out the overflow of abusive content in online communities and social networking sites. The research work proposes a multi-level classification model that deploys various machine and deep learning models to effectively identify offensive content in a tweet. The proposed Auto-Off ID system is designed to build a system that classifies tweets as offensive or non-offensive; filters out and classifies offensive tweets as either targeted or non-targeted; filters out targeted tweets and identify mentions of individuals and organizations who have been bullied. The study is supported by the text analysis features with lexicon features using LIWC, POS tags for primary and secondary users, Twitter Tag Scores (TTS). This system is evaluated using a diverse choice of machine learning and deep learning models from which it is proved that C-LSTM outperform with an accuracy of 91.72% for offensive language identification; LDA + Logistic Regression training with SVM accuracy of 90.87% for offensive tweet classification.

Full Text