Abstract

The increasing use of social media leads to a large number of users-created unstructured text data. Due to the prevalence of social media, cyberbullying has become the main problem. Cyberbullying may cause many serious consequences on a person’s life and community. That is because of various cultures, intellectual and educational backgrounds. The distinction between offensive language and hate speech is an essential Challenge in Detecting noxious textual content. In our work, we proposed a method in order to automatically analyze that tweet on Twitter within binary labels: Offensive and non-offensive. Utilizing the tweets data set, then we implement analyses granting N-grams as a feature and comparative between Term Frequency-Inverse Document Frequencies (TFIDF), and we achieve in, Using Decision Tree Classifier, Multinomial NB (Naive Bayes), Linear SVC (Support Vector Classifier), and AdaBoost Classifier, K-neighbors Classifier, and Logistic Regression machine learning models. Then tuning a model fitting the best results we get in (TFIDF). We achieve the best accuracy, 0.924 in the Linear SVC classifier, best F1 Score 0.942% in the same classifier, the best Precision 0.975 in the AdaBoost classifier, and Best Recall 0.977 in Multinomial NB. In the other model (Count-Vectorizer), we achieve Best accuracy 0.925 in Logistic Regression classifier, Best F1 Score 0.942% in the same classifier, best Precision 0.976 in AdaBoost classifier, and Best Recall 0.941 in Multinomial NB.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call