Arabic Language Modeling Based on Supervised Machine Learning

Omayma Mahmoudi,Mouncef Filali Bouami,Mustapha Badri

doi:10.18280/ria.360315

Abstract

Misinformation and misleading actions have appeared as soon as COVID-19 vaccinations campaigns were launched, no matter what the country’s alphabetization level or growing index is. In such a situation, supervised machine learning techniques for classification appears as a suitable solution to model the value & veracity of data, especially in the Arabic language as a language used by millions of people around the world. To achieve this task, we had to collect data manually from SM platforms such as Facebook, Twitter and Arabic news websites. This paper aims to classify Arabic language news into fake news and real news, by creating a Machine Learning (ML) model that will detect Arabic fake news (DAFN) about COVID-19 vaccination. To achieve our goal, we will use Natural Language Processing (NLP) techniques, which is especially challenging since NLP libraries support for Arabic is not common. We will use NLTK package of python to preprocess the data, and then we will use a ML model for the classification.

Full Text