Abstract
The paper investigates the problem of binary classification of text messages for the presence of bullying. Bullying on the Internet, in particular in social networks, is a serious threat to the mental health of users. Aggressive, offensive or humiliating messages can cause stress, anxiety, depression or other mental disorders. Because of this, identifying and prevent-ing cyberbullying is a priority for organizations developing communication platforms. A dataset with Twitter messages was prepared and pre-processed, including cleaning, tokenization, and lemmatization. 3 sets of input data for classification models were created: Bag-of-Words, TF-IDF matrix, word2vec matrix. Models based on various machine learning methods were built and tested: logistic re-gression, k nearest neighbors, random forest, support vector, naive Bayesian classifier meth-ods on each of the input data sets. Based on the results of testing the models, a comparative analysis of their effectiveness was carried out, logistic regression on Bag-of-Words input data was singled out as the most effective model for the task of binary classification of text messages from the selected set. The results obtained in the course of the study can be used for the development of sys-tems for automatic detection of signs of cyberbullying in the messages of users of social net-works and the prompt use of appropriate measures.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have