Classification of Arabic-Speaking Website Pages with Unscrupulous Intentions and Questionable Language

Haya Mesfer Alshahrani

doi:10.14569/ijacsa.2021.0120146

Abstract

This study aims to put forward a comprehensive and detailed classification system to categorize different Arabic-speaking website pages with unscrupulous intentions and questionable language. The methodology of this is based on a quantitative approach by using different algorithms (supervised) to build a model for data classification by using manually categorized information. The classification algorithm used to construct the model uses quantitative information extracted by Posit or SAFAR textual analysis framework. This model functions with (58) features combined from Posit – n-grams and morphological SAFAR V2 POS tools. This model achieved more than (94 %) success in the level of precision. The results of this study revealed that the best results reaching 94% precision have been achieved by combining Posit + SAFAR + (18 attributes Posit+ SAFAR N-Gram). Moreover, the most reliable results have been achieved by applying a Random Forest classification algorithm using regression. The research recommends working more on this topic and using new algorithms and techniques.

Highlights

The last few years have seen a significant increase in the activities of radicalized extremists, launching terrorist attacks around the whole world
Through this study, a classification model that functions with (58) features combined from Posit – n-grams and morphological SAFAR V2 POS tools was developed where this model achieved more than (99 %) success in the level of precision
By adding the N-Gram attributes to the Posit attribute, the Random Forest classifier dominated over other classifiers with considerable value for the supervised dataset

Summary

Introduction

The last few years have seen a significant increase in the activities of radicalized extremists, launching terrorist attacks around the whole world. They exploit modern technologies such as the Internet and social media, widely used by the public, to plan and maintain contact with their group [1]. Social networks have started to intervene by implementing countermeasures against these groups. Twitter was considered the main promotional vehicle for ISIS, so in August 2016, Twitter started taking more stringent measures by closing more than 36,000 feeds that were believed to belong to ISIS [1, 2]

Objectives

Methods

Results

Conclusion