Abstract

Twitter spam classification is a tough challenge for social media platforms and cyber security companies. Twitter spam with illegal links may evolve over time in order to deceive filtering models, causing disastrous loss to both users and the whole network. We define this distributional evolution as a concept drift scenario. To build an effective model, we adopt K-L divergence to represent spam distribution and use a multiscale drift detection test (MDDT) to localize possible drifts therein. A base classifier is then retrained based on the detection result to gain performance improvement. Comprehensive experiments show that K-L divergence has highly consistent change patterns between features when a drift occurs. Also, the MDDT is proved to be effective in improving final classification result in both accuracy, recall, and f-measure.

Highlights

  • Social media is ubiquitous nowadays, evolving its functions from personal sharing with friends to communicating with strangers of similar interests [1]

  • DRIFTED TWITTER SPAM CLASSIFICATION we present a drifted twitter spam classification method based on multiscale drift detection test (MDDT) [19]

  • WORK In this paper, we have presented a drifted twitter spam classification method by using multiscale drift detection test (MDDT) on K-L divergence

Read more

Summary

INTRODUCTION

Social media is ubiquitous nowadays, evolving its functions from personal sharing with friends to communicating with strangers of similar interests [1]. Data-driven models use classification algorithms or anomaly detection methods to find spam among normal tweets. Based on a resampling scheme and a paired student t-test, we have proposed a multiscale drift detection test (MDDT) that localizes abrupt drift points when a concept changes [19] It applies a detection procedure on two different scales. The main idea is to detect distributional change and use drifted data to update the classification model. MDDT is adopted to check whether current data concepts differ from historical ones and if so, claims the drift time. We utilize Multiscale Drift Detection Test (MDDT) [19] to localize drift points in a time window W It is described in Algorithm 2 and Fig. 2. Can T be further split so as to find an accurate segment between drifted spam and historical one (steps 5-6)? If so, MDDT claims a drift point t∗

10. End If
EXPERIMENTS AND RESULTS
DATASET
CONCLUSION AND FUTURE WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.