Threatening Language Detection and Target Identification in Urdu Tweets

Maaz Amjad,Arkaitz Zubiaga,Alisa Zhila,Alexander Gelbukh,Noman Ashraf,Grigori Sidorov

doi:10.1109/access.2021.3112500

Maaz Amjad, Arkaitz Zubiaga + Show 4 more

Open Access

https://doi.org/10.1109/access.2021.3112500

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 23	License type: CC BY 4.0

Affiliation: Queen Mary University of London, Film Independent

Abstract

Automatic detection of threatening language is an important task, however, most of the existing studies focused on English as the target language, with limited work on low-resource languages. In this paper, we introduce and release a new dataset for threatening language detection in Urdu tweets to further research in this language. The proposed dataset contains 3,564 tweets manually annotated by human experts as either threatening or non-threatening. The threatening tweets are further classified by the target into one of two types: threatening to an individual person or threatening to a group. This research follows a two-step approach: (i) classify a given tweet as threatening or non-threatening and (ii) classify whether a threatening tweet is used to threaten an individual or a group. We compare three forms of text representation: two count-based, where the text is represented using either character $n$ -gram counts or word $n$ -gram counts as feature vectors and the third text representation is based on fastText pre-trained word embeddings for Urdu. We perform several experiments using machine learning and deep learning classifiers and our study shows that an MLP classifier with the combination of word $n$ -gram features outperformed other classifiers in detecting threatening tweets. Further, an SVM classifier using fastText pre-trained word embedding obtained the best results for the target identification task.

Highlights

T HE EMERGENCE of the Internet and communication technology has enabled online social networks to become a significant part of our daily lives, as the number of social media users is growing exponentially
Some users manipulate the Twitter platform to threaten other people and to promote violence by posting threatening content. This has led to a growing body of research investigating the spread of threatening content in social media, among others by examining threatening language and by attempting to detect this type of content [8,9,10]. Given the distress this can cause in online users, furthering research in automatic threatening language identification is of utmost importance to tackle this problem at the scale of a large social media platform like Twitter
Precision, Recall, and F1 scores are presented for all models: Logistic Regression (LR), Multilayer Perceptron (MLP), AdaBoost, Random Forest (RF), Support Vector Machine

Summary

INTRODUCTION

T HE EMERGENCE of the Internet and communication technology has enabled online social networks to become a significant part of our daily lives, as the number of social media users is growing exponentially. Some users manipulate the Twitter platform to threaten other people and to promote violence by posting threatening content (i.e., content expressing an intent to cause harm to others) This has led to a growing body of research investigating the spread of threatening content in social media, among others by examining threatening language and by attempting to detect this type of content [8,9,10]. Given the distress this can cause in online users, furthering research in automatic threatening language identification is of utmost importance to tackle this problem at the scale of a large social media platform like Twitter.

RELATED WORK

Methods

DATASET STATISTICS

BENCHMARKS

EXPERIMENT SETTINGS

DEEP LEARNING CLASSIFIERS

RESULTS AND ANALYSIS

CONCLUSION AND FUTURE WORK

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Threatening Language Detection and Target Identification in Urdu Tweets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

A Machine Learning Approach with Human-AI Collaboration for Automated Classification of Patient Safety Event Reports: Algorithm Development and Validation Study.
Hongbo Chen ... Dulaney Wilson
JMIR Human Factors | VOL. 11
Hongbo Chen, et. al.Hongbo Chen ... Dulaney Wilson
25 Jan 2024
JMIR Human Factors | VOL. 11

Multi-class sentiment analysis of urdu text using multilingual BERT
Lal Khan ... Ammar Amjad
Scientific Reports | VOL. 12
Lal Khan, et. al.Lal Khan ... Ammar Amjad
31 Mar 2022
Scientific Reports | VOL. 12

Machine Learning Classifiers for Twitter Surveillance of Vaping: Comparative Machine Learning Study.
Shyam Visweswaran ... Sanya B Taneja
Journal of Medical Internet Research | VOL. 22
Shyam Visweswaran, et. al.Shyam Visweswaran ... Sanya B Taneja
12 Aug 2020
Journal of Medical Internet Research | VOL. 22

Kurdish Fake News Detection Based on Machine Learning Approaches
Dana Salh ... Rebwar Nabi
Passer Journal of Basic and Applied Sciences | VOL. 5
Dana Salh, et. al.Dana Salh ... Rebwar Nabi
01 Dec 2023
Passer Journal of Basic and Applied Sciences | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Threatening Language Detection and Target Identification in Urdu Tweets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access