Abstract

Twitter is one of the fastest growing microblogging and online social networking site that enables users to send and receive messages in the form of tweets. Twitter is the trend of today for news analysis and discussions. That is why Twitter has become the main target of attackers and cybercriminals. These attackers not only hamper the security of Twitter but also destroy the whole trust people have on it. Hence, making Twitter platform impure by misusing it. Misuse can be in the form of hurtful gossips, cyberbullying, cyber harassment, spams, pornographic content, identity theft, common Web attacks like phishing and malware downloading, etc. Twitter world is growing fast and hence prone to spams. So, there is a need for spam detection on Twitter. Spam detection using supervised algorithms is wholly and solely based on the labelled dataset of Twitter. To label the datasets manually is costly, time-consuming and a challenging task. Also, these old labelled datasets are nowadays not available because of Twitter data publishing policies. So, there is a need to design an approach to label the tweets as spam and non-spam in order to overcome the effect of spam drift. In this paper, we downloaded the recent dataset of Twitter and prepared an unlabelled dataset of tweets from it. Later on, we applied the cluster-then-label approach to label the tweets as spam and non-spam. This labelled dataset can then be used for spam detection in Twitter and categorization of different types of spams.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.