Fine-tuning Arabic Pre-Trained Transformer Models for Egyptian-Arabic Dialect Offensive Language and Hate Speech Detection and Classification

Ibrahim Ahmed,Mohamed Waleed Fahkr,Andrew Ihab,Rany Hatem,Mostafa Abbas

doi:10.1109/esolec54569.2022.10009167

Abstract

Offensive language and Hate Speech are rampant on social media platforms (Facebook, Twitter, etc.) in Egypt for quite a while now, appearing in Tweets, Facebook posts and comments, etc., It is an increasingly outreaching problem that needs immediate attention. This paper focuses on the problem of detecting and classifying both offensive language and Hate Speech using State-of-the-art techniques in text classification. Pre-trained transformer models have gained a reputation of astounding general language understanding that could be fine-tuned for language-specific tasks like Text classification, We collected an Egyptian-Arabic dialect Custom dataset of about 8,000 text samples manually labelled into 5 distinct classes: (Neutral, Offensive, Sexism, Religious Discrimination, Racism), It was used to fine-tune and evaluate multiple different Arabic pre-trained transformer models based on different transformer architectures and pre-training approaches for the Natural Language Processing downstream task of text classification. We achieved an average accuracy of about 96% across all fine-tuned transformer models.

Full Text