Fine-grained hate speech detection in Arabic using transformer-based models

Rajae Bensoltane,Taher Zaki

doi:10.11591/ijece.v14i3.pp2927-2936

Abstract

With the proliferation of social media platforms, characterized by features such as anonymity, user-friendly access, and the facilitation of online community building and discourse, the matter of detecting and monitoring hate speech has emerged as an increasingly formidable challenge for society, individuals, and researchers. Despite the crucial importance of hate speech detection task, the majority of work in this field has been conducted in English, with insufficient focus on other languages, particularly Arabic. Furthermore, most existing studies on Arabic hate speech detection have addressed this task as a binary classification problem, which is unreliable. Therefore, the aim of this study is to provide an enhanced model for detecting fine-grained hate speech in Arabic. To this end, three transformer-based models were evaluated to generate contextualized word embeddings from input sequence. Additionally, these models were combined with a bidirectional gated recurrent unit (BiGRU) layer to further improve the extracted semantic and context features. The experiments were conducted on an Arabic reference dataset provided by the open-source Arabic corpora and processing tools (OSACT-5) shared task. A comparative analysis indicates the efficiency of the proposed model over the baseline and related work models by achieving a macro F1-score of 61.68%.

Full Text