Abstract
Fake news detection for Arabic news has drawn some attention recently. However, the number of such studies are limited due to the lack of datasets that can be used to perform them. Clickbait detection is typically linked to fake news detection as clickbaits are effective in spreading fake news. The lack of dataset in the Arabic language to study clickbait detection models is also evident. This paper presents a dataset of Arabic clickbait news for the first time. The purpose of this dataset is to enable the automatic classification of news headlines as “Clickbait” or “Not Clickbait” using a machine learning model. More than 3000 news records are sampled from five months of tweets for 24 Jordanian news publishers. All sampled news records are labeled by three annotators and that resulted in 18% clickbait news records. The annotator unanimously agreed on the class of about 81% of the labeled news records. To showcase the usability of the resulting dataset in machine learning, Logistic Regression, Support Vector Machine, Random Forrest, Naïve Bayes, Stochastic Gradient Descent, Nearest Neighbor, and Decision Tree are applied to this dataset. These models produced Macro F1-Score value up to 0.81 indicating that the automatic detection of clickbait news headlines using machine learning is feasible.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have