Abstract

Many emerging mobile applications require analyzing large spatial datasets. In these applications, efficient query processing relies on spatial access methods such as R-trees. For datasets that are fairly static, R-trees are often built as a data loading process using packing techniques. However, traditional R-tree packing algorithms can only run on a single machine and thereby cannot scale to very large datasets. In this paper, we design and implement a general framework for parallel Rtree packing using MapReduce. This framework sequentially packs each R-tree level from bottom up. For lower levels that have a large number of rectangles, we propose a partition based algorithm for parallel packing. We also discuss two spatial partitioning methods that can efficiently handle heavily skewed datasets. To evaluate the performance, we conducted extensive experiments using large real datasets. The size of the datasets is up to 100GB and the number of spatial objects is up to 2 billion. Besides range queries, k-nearest neighbor searches and spatial joins are also used for evaluation. To the best of our knowledge, it is the first work that evaluates the query performance of packed R-trees on such large datasets with spatial queries other than range queries. The results confirm the scalability of our proposed framework and parallel packing algorithms. It is also shown that our packed R-trees have good query performance and optimal space utilization.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.