Abstract

Studying the bursty nature of cascades in social media is practically important in many real applications such as product sales prediction, disaster relief, and stock market prediction. Although both the cascade size prediction and the burst patterns of the cascades have been extensively studied, how to predict when a burst will come remains an open problem. It is challenging for traditional time-series-based models such as regression models to address this task directly. Firstly, times-series-based prediction models focus on predicting the future values based on previously observed ones. It is hard to apply them to predict the time of a bursts with the "quick rise-and-fall" pattern. Secondly, besides the cascade popularity, a lot of other side information like user profile and social relation are available in social media. Although the potential utility of such information can be high, it is also hard for time-series-based models to capture and integrate these rich information with diverse formats seamlessly. This paper proposes a classification-based approach for burst time prediction by exploiting rich knowledge in information diffusion. Particularly, we first propose a time-window-based transformation to predict in which time window the burst will appear. By dividing the time spans of all the cascades into the same number of time windows K, the cascades with diverse time spans can thus be handled uniformly. To exploit the rich and heterogenous information in social media, we next propose a scale-independent feature extraction framework to model the heterogenous knowledge in a scale-independent manner. Systematical evaluations are conducted on the Sina Weibo reposting dataset and MemeTracker dataset. Besides the superior performance of the proposed approach, we also observe that: (1) surprisingly, social/structure knowledge is more indicative of the bursts than the cascade popularity information, especially for the bursts occurring in a farther future. (2) Larger cascades are harder to predict as the spreading process of the cascades with higher popularity is usually more diverse and fluctuant. (3) The proposed approach is robust in the sense that the result is not much sensitive to the popularity of the training cascades.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.