Ethnicity-targeted hate speech has been widely shown to influence on-the-ground inter-ethnic conflict and violence, especially in such multi-ethnic societies as Russia. Therefore, ethnicity-targeted hate speech detection in user texts is becoming an important task. However, it faces a number of unresolved problems: difficulties of reliable mark-up, informal and indirect ways of expressing negativity in user texts (such as irony, false generalization and attribution of unfavored actions to targeted groups), users’ inclination to express opposite attitudes to different ethnic groups in the same text and, finally, lack of research on languages other than English. In this work we address several of these problems in the task of ethnicity-targeted hate speech detection in Russian-language social media texts. This approach allows us to differentiate between attitudes towards different ethnic groups mentioned in the same text – a task that has never been addressed before. We use a dataset of over 2,6M user messages mentioning ethnic groups to construct a representative sample of 12K instances (ethnic group, text) that are further thoroughly annotated via a special procedure. In contrast to many previous collections that usually comprise extreme cases of toxic speech, representativity of our sample secures a realistic and, therefore, much higher proportion of subtle negativity which additionally complicates its automatic detection. We then experiment with four types of machine learning models, from traditional classifiers such as SVM to deep learning approaches, notably the recently introduced BERT architecture, and interpret their predictions in terms of various linguistic phenomena. In addition to hate speech detection with a text-level two-class approach (hate, no hate), we also justify and implement a unique instance-based three-class approach (positive, neutral, negative attitude, the latter implying hate speech). Our best results are achieved by using fine-tuned and pre-trained RuBERT combined with linguistic features, with F1-hate=0.760, F1-macro=0.833 on the text-level two-class problem comparable to previous studies, and F1-hate=0.813, F1-macro=0.824 on our unique instance-based three-class hate speech detection task. Finally, we perform error analysis, and it reveals that further improvement could be achieved by accounting for complex and creative language issues more accurately, i.e., by detecting irony and unconventional forms of obscene lexicon.
Read full abstract