Comprehensive Study of Arabic Satirical Article Classification

Fatmah Assiri,Hanen Himdi

doi:10.3390/app131910616

Abstract

A well-known issue for social media sites consists of the hazy boundaries between malicious false news and protected speech satire. In addition to the protective measures that lessen the exposure of false material on social media, providers of fake news have started to pose as satire sites in order to escape being delisted. Potentially, this may cause confusion to the readers as satire can sometimes be mistaken for real news, especially when their context or intent is not clearly understood and written in a journalistic format imitating real articles. In this research, we tackle the issue of classifying Arabic satiric articles written in a journalistic format to detect satirical cues that aid in satire classification. To accomplish this, we compiled the first Arabic satirical articles dataset extracted from real-world satirical news platforms. Then, a number of classification models that integrate a variety of feature extraction techniques with machine learning, deep learning, and transformers to detect the provenance of linguistic and semantic cues were investigated, including the first use of the ArabGPt model. Our results indicate that BERT is the best-performing model with F1-score reaching 95%. We also provide an in-depth lexical analysis of the formation of Arabic satirical articles. The lexical analysis provides insights into the satirical nature of the articles in terms of their linguistic word uses. Finally, we developed a free open-source platform that automatically organizes satirical and non-satirical articles in their correct classes from the best-performing model in our study, BERT. In summary, the obtained results found that pretrained models gave promising results in classifying Arabic satirical articles.

Full Text