Identification of markers and artificial intelligence-based classification of radical Twitter data

Mohammad Fraiwan

doi:10.1108/aci-12-2021-0326

Abstract

Purpose Social networks (SNs) have recently evolved from a means of connecting people to becoming a tool for social engineering, radicalization, dissemination of propaganda and recruitment of terrorists. It is no secret that the majority of the Islamic State in Iraq and Syria (ISIS) members are Arabic speakers, and even the non-Arabs adopt Arabic nicknames. However, the majority of the literature researching the subject deals with non-Arabic languages. Moreover, the features involved in identifying radical Islamic content are shallow and the search or classification terms are common in daily chatter among people of the region. The authors aim at distinguishing normal conversation, influenced by the role religion plays in daily life, from terror-related content. Design/methodology/approach This article presents the authors' experience and the results of collecting, analyzing and classifying Twitter data from affiliated members of ISIS, as well as sympathizers. The authors used artificial intelligence (AI) and machine learning classification algorithms to categorize the tweets, as terror-related, generic religious, and unrelated. Findings The authors report the classification accuracy of the K-nearest neighbor (KNN), Bernoulli Naive Bayes (BNN) and support vector machine (SVM) [one-against-all (OAA) and all-against-all (AAA)] algorithms. The authors achieved a high classification F1 score of 83\%. The work in this paper will hopefully aid more accurate classification of radical content. Originality/value In this paper, the authors have collected and analyzed thousands of tweets advocating and promoting ISIS. The authors have identified many common markers and keywords characteristic of ISIS rhetoric. Moreover, the authors have applied text processing and AI machine learning techniques to classify the tweets into one of three categories: terror-related, non-terror political chatter and news and unrelated data-polluting tweets.

Highlights

Recent years have witnessed the rise of the ISIS, known as the Islamic State in Iraq and Levant (ISIL), or Islamic State (IS) for short
We focus primarily on Arabic Twitter data, which is the primary language used by ISIS members and sympathizers
Precision measures the ratio of true positives to all elements identified as positives; recall measures the ratio of true positives to all relevant elements; the F1 score is the harmonic mean of the recall and precision and expresses the accuracy of classification in unbalanced datasets and the accuracy is defined as the ratio of the true positives for all classes to the number of instances

Summary

Introduction

Recent years have witnessed the rise of the ISIS, known as the Islamic State in Iraq and Levant (ISIL), or Islamic State (IS) for short. In the past few months, the international alliance has driven ISIS out of its major strongholds and liberated. In the past few years, the research landscape has shifted dramatically toward analyzing the political implications of online content. Several news outlets reported on a Russian-driven Facebook group that was able to amass 225,000 followers and organize rallies inside the USA from abroad [3, 4]. We discuss the main efforts in the literature to identify and analyze radical content and communities. Detection refers to the ability to identify hate speech, violence and terrorism by employing text classification methods

Objectives

Methods

Results

Conclusion