Abstract

Instagram is a free photo-sharing platform where each user has a profile and can upload photos for followers to view, like, and comment. Abusive comments on images can be humiliating and harmful to those who share photos. Developing a comment filter in languages other than English is difficult and time-consuming. This paper proposes a dataset called Abusive Turkish Comments (ATC) to detect abusive Instagram comments in Turkish. It is composed of a large number of Instagram comments posted to tabloid and sports accounts (i.e., 10,528 abusive and 19,826 not-abusive). It is the first public dataset dedicated to detecting abusive Turkish messages, as far as we know. The sentiment annotation has been done in sentence-level by assigning polarity to each comment. The performance of the abusive message detection models was evaluated using several performance metrics: Convolutional Neural Network (CNN), five well-known classifiers (i.e., Naive Bayes, Support Vector Machine, Decision Tree, Random Forest, and Logistic Regression), and two reweighted classifiers (i.e., Adaptive Boosting (AdaBoost), eXtreme Gradient Boosting (XGBoost)) were compared in terms of F1-score, precision, and recall. The results showed that the best performance (i.e., Micro-averaged F1-score: 0.974, Macro-averaged F1-score: 0.973, Kappa-value: 0.946) was yielded by the CNN model on the oversampled ATC dataset. The abusive message detection model proposed in this study can contribute to the development of Turkish comment filters on Instagram. Different model combinations are considered to select the best model that gives better recognition accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call