Abstract

Predicting diffusions on big social media data using natural language processing (NLP) and social network analysis (SNA) techniques is an emerging research domain. To predict diffusions of novel previous studies focus on predicting the diffusions on cross-topic-observed diffusions (the diffusions between the source and target user of the diffusion are not observed for the topic to be predicted, but still observed for other topics). However, in real world social network, many diffusions to be predicted are actually unobserved. For example, the diffusions may be unseen (the diffusions between the source and target user of the diffusion are not observed in training data), or even with silence (one or both of the of the diffusion never participate a diffusion before). In this paper, we generalize the diffusion prediction on novel topic problem to predict both cross-topic-observed and unobserved diffusions, which is very challenging because of lacking previous diffusion records. We design a learning-based framework to solve the problem. Leveraging NLP and SNA techniques to deal with such Big Data, we exploit the latent semantic derived from diverse information sources (e.g., user, topic, user-topic, and topological), and utilize the idea that users with the same attribute value tend to have similar behavior for similar topics, to extract features for prediction. Our framework is evaluated on real-world microblog data, and the experiments show that we can achieve 73% AUC in this difficult prediction task. Our dataset is also publicly available at http://mslab.csie.ntu.edu.tw/~tim/ase_big_data_2015.zip.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call