Abstract
With the increasing availability of Location-Based Services (LBS) and mobile internet, the amount of spatial data is growing larger. It poses new requirements and challenges for distributed index and query processing on large scale spatial data. A scalable and distributed spatial data index is important for the effective Nearest Neighbor (NN) query. There are several approaches that implement distributed indices and NN query processing with MapReduce, such as R-tree and Voronoi-based index. However, R-tree is unsuitable for parallelization and Voronoi requires extra computation for localization or local index reconstruction. In this paper, we investigate how to perform NN queries in a distributed environment. Firstly, we present distributed approaches that construct a novel distributed spatial data index: Inverted Grid Index, which is a combination of inverted index and grid partition. Secondly, we illustrate the implementations of two typical applications: distributed k Nearest Neighbor (kNN) and Reverse Nearest Neighbor (RNN) queries which are based on our index structure under cloud computing environment. Finally, we evaluate the effectiveness of our algorithms with extensive experiments using both real and synthetic data sets. Our experiments demonstrate that the time of constructing index structure decreases almost linearly as the number of cluster nodes increases. The results also demonstrate the efficiency and scalability of our NN query algorithms based on Inverted Grid Index.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have