R-Grove

Tin Vu,Ahmed Eldawy

doi:10.1145/3274895.3274984

Abstract

The rapid growth of big spatial data urged the research community to develop several big spatial data systems. Regardless of their architecture, one of the fundamental requirements of all these systems is to partition the data efficiently across machines. A widely-used technique for big spatial indexing is to reuse existing search trees asis, e.g., the R-tree family, by building a temporary tree for a sample of the input and use its leaf nodes as partition boundaries. However, we show in this paper that this approach has major limitations that make it unsuitable for the big data environment. This paper studies the use of three popular trees from the R-tree family to index big spatial data, namely, the original R-tree by Guttman, R*-tree, and RR*-tree. We show that the entire family of R-trees is not ready to grow in the big data forest due to fundamental limitations in their design. To overcome these limitations, we propose three new indexes, namely, R-Grove, R*-Grove, and RR*-Grove, which are fundamentally modified to work with big data while inheriting the main characteristics of their traditional index counterparts. With all the proposed indexes publicly available as open source, we hope that these new indexes will be adopted by the community to better serve big spatial data research.

Full Text