Abstract

In this paper, we investigate the selectivity estimation problem for streaming spatio-textual data, which arises in many social network and geo-location applications. Specifically, given a set of continuously and rapidly arriving spatio-textual objects, each of which is described by a geo-location and a short text, we aim to accurately estimate the cardinality of a spatial keyword query on objects seen so far, where a spatial keyword query consists of a search region and a set of query keywords. To the best of our knowledge, this is the first work to address this important problem. We first extend two existing techniques to solve this problem, and show their limitations. Inspired by two key observations on the "locality" of the correlations among query keywords, we propose a local correlation based method by utilizing an augmented adaptive space partition tree ( A 2 SP -tree for short) to approximately learn a local Bayesian network on-the-fly for a given query and estimate its selectivity. A novel local boosting approach is presented to further enhance the learning accuracy of local Bayesian networks. Our comprehensive experiments on real-life datasets demonstrate the superior performance of the local correlation based algorithm in terms of estimation accuracy compared to other competitors.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.