Abstract

The work provides the analysis of specifics of web users’ communication. Special attention is paid to the communication in social media and, in particular, to the problem of detection of abusive content in social media users’ messages. The main distinctive features of communication in social media are defined, and also the analysis of causes that can disimprove the accuracy of automated detection of abusive content is performed. The stages of processing the natural language text data are analysed, in particular the detection of abusive content, concerning the possibility of its modification for considering the peculiarities of communication in social media and other aspects that can influence the final result of text data classification. It is defined, that peculiar features of messages and special aspects of communication in social media such as the existence of symbols and numbers that can lead to confusion in message apprehension, the existence of emoji, context (at a level of social media users’ connections) and context (at a level of social media messages) can be concerned in the process of automated detection of abusive content in social media text messages. Corresponding modifications of text pre-processing stages and the analysis of classification results are defined. Method of automated detection of abusive content in social media text messages is proposed based on the principles of machine learning with modified approach to natural language text data pre-processing. Incoming data format for efficient detection of abusive content in them is proposed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call