Abusive Content Detection in Arabic Tweets Using Multi-Task Learning and Transformer-Based Models

Bedour Alrashidi,Ali Alkhathlan,Amani Jamal

doi:10.3390/app13105825

Bedour Alrashidi, Ali Alkhathlan + Show 1 more

Open Access

https://doi.org/10.3390/app13105825

Copy DOI

Journal: Applied Sciences	Publication Date: May 9, 2023
Citations: 1	License type: CC BY 4.0

Affiliation: University of Ha'il, King Abdulaziz University

Abstract

Different social media platforms have become increasingly popular in the Arab world in recent years. The increasing use of social media, however, has also led to the emergence of a new challenge in the form of abusive content, including hate speech, offensive language, and abusive language. Existing research work focuses on automatic abusive content detection as a binary classification problem. In addition, the existing research work on the automatic detection task surrounding abusive Arabic content fails to tackle the dialect-specific phenomenon. Consequently, this has led to two important issues in the automatic abusive Arabic content detection task. In this study, we used a multi-aspect annotation schema to tackle the automatic abusive content detection problem in Arabic countries, based on the multi-class classification task and the dialectal Arabic (DA)-specific phenomenon. More precisely, the multi-aspect annotation schema includes five attributes: directness, hostility, target, group, and annotator. We specifically developed a framework to automatically detecting abusive content on Twitter using natural language processing (NLP) techniques. The developed framework used different models of machine learning (ML), deep learning (DL), and pretrained Arabic language models (LMs) using the multi-aspect annotation dataset. In addition, to investigate the impact of the other approaches, such as multi-task learning (MTL), we developed four MTL models built on top of a pretrained DA language model (called MARBERT) and trained on the multi-aspect annotation dataset. Our MTL models and pretrained Arabic LMs enhanced the performance compared to the existing DL model mentioned in the literature.

Full Text