Improvised Spam Detection in Twitter Data Using Lightweight Detectors and Classifiers

Velammal B L Velammal B L,Aarthy N Aarthy N

doi:10.4018/ijwltt.20210701.oa2

Velammal B L Velammal B L, Aarthy N Aarthy N

Open Access

https://doi.org/10.4018/ijwltt.20210701.oa2

Copy DOI

Abstract

Receiving spam messages is one of the most serious issues in social media, especially in Twitter, which is a widely used platform to reflect the opinions and emotions of an individual publicly as well as focused to a specific group of members with similar thoughts or discussion topic. In such focused discussion groups, getting spam message through social media sites is the most annoying issue. In this paper, a system is developed to detect spam tweets by using four lightweight detectors, namely blacklist domain detector, near duplicate detector, reliable ham detector, and multiclass detector. The detected tweets are then classified using ensemble classifiers such as naïve Bayes, logistic regression, and random forest. Voting method is applied to decide the labels for the tweets obtained after classification process. The proposed system has achieved an accuracy of 79% to detect spam tweets with the help of naïve Bayes classifier method and the value seems to be optimizing further with the availability of more sample data.

Highlights

Nowadays social media has become the most unavoidable and the most popular means for communication amongst the individuals
The proposed system has achieved an accuracy of 79% to detect spam tweets with the help of naïve Bayes classifier method and the value seems to be optimizing further with the availability of more sample data
The system can be able to detect such a collective spamming behavior and precautionary measures can be taken to restrict such messages. Another way of achieving this phenomenon is by clustering tweets with the same final Uniform Resource Locators (URLs) into a campaign using the Twitter dataset and partitioning the dataset into numerous campaigns based on URLs

Summary

INTRODUCTION

Nowadays social media has become the most unavoidable and the most popular means for communication amongst the individuals. The system can be able to detect such a collective spamming behavior and precautionary measures can be taken to restrict such messages Another way of achieving this phenomenon is by clustering tweets with the same final URL into a campaign using the Twitter dataset and partitioning the dataset into numerous campaigns based on URLs. perform a detailed analysis over the campaign data and generate a set of useful features to classify a campaign into two classes: spam or legitimate. Spam effects are likely to be considered as annoyance to individual users, less reliable e-mails, loss of work productivity, misuse of network bandwidth, wastage of file server storage space and computational power It can include spreading of viruses, worms, Trojan horses and financial losses through phishing, Denial of Service (DoS), directory harvesting attacks. Y, (2012), indicating that users are more prone to trust spam messages from their friends in OSNs

BACKGROUND

EXPERIMENTAL SETUP

DATA PREPROCESSING

Blacklist Domain Detector

Near Duplicate Detector

RELIABLE HAM DETECTOR

Naivebayes

Logistic Regression

EXPERIMENTS AND RESULTS

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Web-Based Learning and Teaching Technologies	Publication Date: Jul 1, 2021
Citations: 1	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Improvised Spam Detection in Twitter Data Using Lightweight Detectors and Classifiers

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Web-Based Learning and Teaching Technologies

Lead the way for us

Similar Papers

Cross-platform spam messages classification based on the multiple machine learning algorithms
Mengliang Tan
Applied and Computational Engineering | VOL. 15
Mengliang TanMengliang Tan
23 Oct 2023
Applied and Computational Engineering | VOL. 15

Automatic Detection of Online Hate Speech Against Women Using Voting Classifier
F H A Shibly ... H M M Naleer
-
F H A Shibly, et. al.F H A Shibly ... H M M Naleer
27 Sep 2022
27 Sep 2022

Comparison of Seven Machine Learning Algorithms in the Classification of Public Opinion
Sri Redjeki ... Setyawan Widyarto
Tech-E | VOL. 5
Sri Redjeki, et. al.Sri Redjeki ... Setyawan Widyarto
25 Mar 2022
Tech-E | VOL. 5

ClickbaitTR: Dataset for clickbait detection from Turkish news sites and social media with a comparative analysis via machine learning algorithms
Şura Genç ... Elif Surer
Journal of Information Science | VOL. 49
Şura Genç, et. al.Şura Genç ... Elif Surer
12 Apr 2021
Journal of Information Science | VOL. 49

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improvised Spam Detection in Twitter Data Using Lightweight Detectors and Classifiers

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Web-Based Learning and Teaching Technologies