Abstract

Offensive language and Hate Speech are rampant on social media platforms (Facebook, Twitter, etc.) in Egypt for quite a while now, appearing in Tweets, Facebook posts and comments, etc., It is an increasingly outreaching problem that needs immediate attention. This paper focuses on the problem of detecting and classifying both offensive language and Hate Speech using State-of-the-art techniques in text classification. Pre-trained transformer models have gained a reputation of astounding general language understanding that could be fine-tuned for language-specific tasks like Text classification, We collected an Egyptian-Arabic dialect Custom dataset of about 8,000 text samples manually labelled into 5 distinct classes: (Neutral, Offensive, Sexism, Religious Discrimination, Racism), It was used to fine-tune and evaluate multiple different Arabic pre-trained transformer models based on different transformer architectures and pre-training approaches for the Natural Language Processing downstream task of text classification. We achieved an average accuracy of about 96% across all fine-tuned transformer models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call