Birds of prey: identifying lexical irregularities in spam on Twitter

Kyle Robinson,Vijay Mago

doi:10.1007/s11276-018-01900-9

Abstract

The advent of spam on social media platforms has lead to a number of problems not only for social media users but also for researchers mining social media data. While there has been substantial research on automated methods of spam detection on Twitter, research on the lexical content of spam on the platform is limited. A dataset of 301 million generic tweets was filtered through a URL blacklisting service to obtain 7207 tweets containing links to malicious web-pages. These tweets, considered spam, were combined with a random sample of non-spam tweets to obtain an overall dataset of 14,414 tweets. A total of 12 numerical tweet features were used to train and test a Random Forest algorithm with an overall classification accuracy of over 90%. In addition to the numerical features, the text of each tweet was processed to create four frequency-mapped corpora pertaining uniquely to spam and non-spam data. The corpora of words, emoji, numbers, and stop-words for spam and non-spam were plotted against each other to visualize differences in usage between the two groups. A clear distinction between words, and emoji used in spam, and non-spam tweets was observed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Birds of prey: identifying lexical irregularities in spam on Twitter

Abstract

Talk to us

Similar Papers

More From: Wireless Networks

Lead the way for us

Journal: Wireless Networks	Publication Date: Dec 11, 2018
Citations: 7

Similar Papers

Going Viral: The 3 Rs of Social Media Messaging during Public Health Emergencies.
Bhavini Patel Murthy ... Tanya Telfair Leblanc
Health security | VOL. 19
Bhavini Patel Murthy, et. al.Bhavini Patel Murthy ... Tanya Telfair Leblanc
01 Feb 2021
Health security | VOL. 19

Beware: Patients increasingly purchasing medications via social media
Loren Bonner
Pharmacy Today | VOL. 28
Loren BonnerLoren Bonner
01 Sep 2022
Pharmacy Today | VOL. 28

Twitter Archives and the Challenges of "Big Social Data" for Media and Communication Research
Jean Burgess ... Axel Bruns
M/C Journal | VOL. 15
Jean Burgess, et. al.Jean Burgess ... Axel Bruns
11 Oct 2012
M/C Journal | VOL. 15

Social Media Analytics: Techniques, Tools, Platforms a Comprehensive Review
Ravinder Ahuja ... Anupam Lakhanpal
-
Ravinder Ahuja, et. al.Ravinder Ahuja ... Anupam Lakhanpal
12 Oct 2020
12 Oct 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Birds of prey: identifying lexical irregularities in spam on Twitter

Abstract

Talk to us

Similar Papers

More From: Wireless Networks