Abstract

Social media expose millions of users every day to information campaigns - some emerging organically from grassroots activity, others sustained by advertising or other coordinated efforts. These campaigns contribute to the shaping of collective opinions. While most information campaigns are benign, some may be deployed for nefarious purposes, including terrorist propaganda, political astroturf, and financial market manipulation. It is therefore important to be able to detect whether a meme is being artificially promoted at the very moment it becomes wildly popular. This problem has important social implications and poses numerous technical challenges. As a first step, here we focus on discriminating between trending memes that are either organic or promoted by means of advertisement. The classification is not trivial: ads cause bursts of attention that can be easily mistaken for those of organic trends. We designed a machine learning framework to classify memes that have been labeled as trending on Twitter. After trending, we can rely on a large volume of activity data. Early detection, occurring immediately at trending time, is a more challenging problem due to the minimal volume of activity data that is available prior to trending. Our supervised learning framework exploits hundreds of time-varying features to capture changing network and diffusion patterns, content and sentiment information, timing signals, and user meta-data. We explore different methods for encoding feature time series. Using millions of tweets containing trending hashtags, we achieve 75% AUC score for early detection, increasing to above 95% after trending. We evaluate the robustness of the algorithms by introducing random temporal shifts on the trend time series. Feature selection analysis reveals that content cues provide consistently useful signals; user features are more informative for early detection, while network and timing features are more helpful once more data is available.

Highlights

  • An increasing number of people rely, at least in part, on information shared on social media to form opinions and make choices on issues related to lifestyle, politics, health, and products purchases [ – ]

  • We identified an algorithm, called K-Nearest Neighbor with Dynamic Time Warping (KNN-dynamic time warping (DTW)), that is capable of dealing with multidimensional time series classification

  • In principle we could use the entire time series for classification, ex-post information would not serve our goal of early detection of social media campaigns in a streaming scenario that resembles a real setting, where information about the future evolution of a trend is obviously unavailable

Read more

Summary

Introduction

An increasing number of people rely, at least in part, on information shared on social media to form opinions and make choices on issues related to lifestyle, politics, health, and products purchases [ – ]. Such reliance provides a variety of entities - from single users to corporations, interest groups, and governments - with motivation to influence collective opinions through active participation in online conversations. There are obvious incentives for the adoption of covert methods that enhance both perceived and actual popularity of promoted information. Even when the intentions of the promoter are benign, we interpret large (but possibly artificially enhanced) popularity as widespread endorsement of, or trust in, the promoted information

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.