Abstract

We are witnessing increasing interests in longitudinal analytics on social media data. Longitudinal analytics takes into account an interval and considers the temporal popularity of social media data in the interval, rather than only considering recently generated social media data in real-time search. We study a fundamental functionality in longitudinal analytics—the top-k temporal keyword (TkTK) querying. A TkTK query takes as input a set of query keywords and an interval, and returns the top-k most significant social items, e.g., tweets, where the significance of a social item is defined based on a combination of the textual relevance and temporal popularity. We model social media data as a forest of linkage trees along the time dimension, which well models the propagation processes, e.g., replies and forwards, among different social items. Based on the forest, we model the temporal popularity of a social item across time as a popularity time series. We design two indexing structures that index social items' popularity time series and textual content in a holistic manner—the temporal popularity inverted index (TPII) and the log-structured merge octree (LSMO). Empirical studies with two substantial social media data sets offer insight into the design properties of the indexes and confirm that LSMO enables both efficient query processing and indexing structure updates.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.