Abstract

Receiving spam messages is one of the most serious issues in social media, especially in Twitter, which is a widely used platform to reflect the opinions and emotions of an individual publicly as well as focused to a specific group of members with similar thoughts or discussion topic. In such focused discussion groups, getting spam message through social media sites is the most annoying issue. In this paper, a system is developed to detect spam tweets by using four lightweight detectors, namely blacklist domain detector, near duplicate detector, reliable ham detector, and multiclass detector. The detected tweets are then classified using ensemble classifiers such as naïve Bayes, logistic regression, and random forest. Voting method is applied to decide the labels for the tweets obtained after classification process. The proposed system has achieved an accuracy of 79% to detect spam tweets with the help of naïve Bayes classifier method and the value seems to be optimizing further with the availability of more sample data.

Highlights

  • Nowadays social media has become the most unavoidable and the most popular means for communication amongst the individuals

  • The proposed system has achieved an accuracy of 79% to detect spam tweets with the help of naïve Bayes classifier method and the value seems to be optimizing further with the availability of more sample data

  • The system can be able to detect such a collective spamming behavior and precautionary measures can be taken to restrict such messages. Another way of achieving this phenomenon is by clustering tweets with the same final Uniform Resource Locators (URLs) into a campaign using the Twitter dataset and partitioning the dataset into numerous campaigns based on URLs

Read more

Summary

INTRODUCTION

Nowadays social media has become the most unavoidable and the most popular means for communication amongst the individuals. The system can be able to detect such a collective spamming behavior and precautionary measures can be taken to restrict such messages Another way of achieving this phenomenon is by clustering tweets with the same final URL into a campaign using the Twitter dataset and partitioning the dataset into numerous campaigns based on URLs. perform a detailed analysis over the campaign data and generate a set of useful features to classify a campaign into two classes: spam or legitimate. Spam effects are likely to be considered as annoyance to individual users, less reliable e-mails, loss of work productivity, misuse of network bandwidth, wastage of file server storage space and computational power It can include spreading of viruses, worms, Trojan horses and financial losses through phishing, Denial of Service (DoS), directory harvesting attacks. Y, (2012), indicating that users are more prone to trust spam messages from their friends in OSNs

BACKGROUND
EXPERIMENTAL SETUP
DATA PREPROCESSING
Blacklist Domain Detector
Near Duplicate Detector
RELIABLE HAM DETECTOR
Naivebayes
Logistic Regression
EXPERIMENTS AND RESULTS
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.