Abstract

Predicting information cascade plays a crucial role in various applications such as advertising campaigns, emergency management and infodemic controlling. However, predicting the scale of an information cascade in the long-term could be difficult. In this study, we take Weibo, a Twitter-like online social platform, as an example, exhaustively extract predictive features from the data, and use a conventional machine learning algorithm to predict the information cascade scales. Specifically, we compare the predictive power (and the loss of it) of different categories of features in short-term and long-term prediction tasks. Among the features that describe the user following network, retweeting network, tweet content and early diffusion dynamics, we find that early diffusion dynamics are the most predictive ones in short-term prediction tasks but lose most of their predictive power in long-term tasks. In-depth analyses reveal two possible causes of such failure: the bursty nature of information diffusion and feature temporal drift over time. Our findings further enhance the comprehension of the information diffusion process and may assist in the control of such a process.

Highlights

  • The study of information diffusion in online social networks has practical values in various domains, such as advertisement royalsocietypublishing.org/journal/rsos R

  • On the basis of prior works, we further explore the unpredictable events in information diffusion processes and summarize several common scenarios in which the actual tweet popularity largely diverges from the predicted scales

  • We compare the predictive power of different groups of features in the short- and long-term cascade sizes and found that the prediction error increases sharply with the prediction gap

Read more

Summary

Introduction

The study of information diffusion in online social networks has practical values in various domains, such as advertisement royalsocietypublishing.org/journal/rsos R. The effect of these factors may change dynamically with the diffusion process, making long-term popularity prediction a challenging task [7]. A typical burst scenario in online information diffusion is when a tweet has a low activity (e.g. the number of retweets) in the early diffusion process but suddenly gains its popularity owing to the retweet by a key opinion leader In this case, the ML model features constructed based on the early retweeting dynamics will be invalid in the prediction task. We consider a Twitterlike online social networking service as a typical example of information diffusion and adopt a bestpractice ML method to predict the cascade size, i.e. the number of people who retweeted a original post. The rest of the paper is organized as follows: §2 describes the data we used; §3 describes the features we used and analyses the prediction results of various features; §4 explores some temporal dynamics that make cascade size difficult to predict; and §5 is the conclusion

Data description and preprocessing
Predicting information diffusion scale
Features for machine learning model training
Following network structure
Retweeting network structure
Early retweet dynamics
Tweet content
Why are long-term predictions difficult?
Hard-to-predict dynamics in multi-peak propagations
The variability of the original time of posting
Feature temporal drifting
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call