Abstract

Spam is thriving on Arabic Twitter. With a large online population, a mounting political unrest, and an undersized and unspecialized response effort, the current state of Arabic online social networks (OSNs) offers a perfect target for the spam industry, bringing both abuse and manipulation to the scene. The result is a ubiquitous spam presence that redefines the signal to noise ratio, and makes spam a de facto component of the online social platforms. English spam on online social networks has been heavily studied in the literature. To date however, social spam in other languages has been largely ignored. Our own analysis of spam content on Arabic trending hashtags in Saudi Arabia results in an estimate of about three quarters of the total generated content. This alarming rate, backed by independent concurrent estimates, makes the development of adaptive spam detection techniques a very real and pressing need. In this study, we present a first attempt at detecting accounts that promote spam and content pollution on Arabic Twitter. Using a large crawled dataset of more than 23 million Arabic tweets, and a manually labeled sample of more than 5000 tweets, we analyze the spam content on Saudi Twitter, and assess the performance of previous spam detection features on our recently gathered dataset. We also adapt the previously proposed features to respond to spammers evading techniques, and use these features to build a new highly accurate data-driven detection system.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.