R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets

Tin Vu,Ahmed Eldawy

doi:10.3389/fdata.2020.00028

Abstract

The rapid growth of big spatial data urged the research community to develop several big spatial data systems. Regardless of their architecture, one of the fundamental requirements of all these systems is to spatially partition the data efficiently across machines. The core challenges of big spatial partitioning are building high spatial quality partitions while simultaneously taking advantages of distributed processing models by providing load balanced partitions. Previous works on big spatial partitioning are to reuse existing index search trees as-is, e.g., the R-tree family, STR, Kd-tree, and Quad-tree, by building a temporary tree for a sample of the input and use its leaf nodes as partition boundaries. However, we show in this paper that none of those techniques has addressed the mentioned challenges completely. This paper proposes a novel partitioning method, termed R*-Grove, which can partition very large spatial datasets into high quality partitions with excellent load balance and block utilization. This appealing property allows R*-Grove to outperform existing techniques in spatial query processing. R*-Grove can be easily integrated into any big data platforms such as Apache Spark or Apache Hadoop. Our experiments show that R*-Grove outperforms the existing partitioning techniques for big spatial data systems. With all the proposed work publicly available as open source, we envision that R*-Grove will be adopted by the community to better serve big spatial data research.

Highlights

The recent few years witnessed a rapid growth of big spatial data collected by different applications such as satellite imagery (Eldawy et al, 2015b), social networks (Magdy et al, 2014), smart phones (Henke et al, 2016), and VGI (Goodchild, 2007)
This paper proposes a novel spatial partitioning technique for big data, termed R*-Grove, which completely addresses all of three aforementioned limitations
The last two groups indicate that R*-Grove significantly outperforms other partitioning techniques in terms of range query and spatial join query performance

Summary

Introduction

The recent few years witnessed a rapid growth of big spatial data collected by different applications such as satellite imagery (Eldawy et al, 2015b), social networks (Magdy et al, 2014), smart phones (Henke et al, 2016), and VGI (Goodchild, 2007). Traditional Spatial DBMS technology could not scale up to these petabytes of data which led to the birth of many big spatial data management systems such as SpatialHadoop (Eldawy and Mokbel, 2015), GeoSpark (Yu et al, 2015), Simba (Xie et al, 2016), LocationSpark (Tang et al, 2016), and Sphinx (Eldawy et al, 2017), to name a few Regardless of their architecture, all these systems need an essential preliminary step that partitions the data across machines before the execution can be parallelized. A common method that was first introduced in SpatialHadoop (Eldawy and Mokbel, 2015), is the sample-based STR partitioner This method picks a small sample of the input to determine its distribution, packs this sample using the STR packing algorithm (Leutenegger et al, 1997), and uses the boundaries of the leaf nodes to partition the entire data. The SPLITNODE method takes an overflow node with M + 1 records and splits it into two nodes

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Big Data	Publication Date: Aug 28, 2020
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Big Data

Lead the way for us

Similar Papers

R-Grove
Tin Vu ... Ahmed Eldawy
-
Tin Vu, et. al.Tin Vu ... Ahmed Eldawy
06 Nov 2018
06 Nov 2018

Clustering-based method for big spatial data partitioning
Alaa Aldin Zein ... Mohamad Iyad Al-Khayatt
Measurement: Sensors | VOL. 27
Alaa Aldin Zein, et. al.Alaa Aldin Zein ... Mohamad Iyad Al-Khayatt
04 Mar 2023
Measurement: Sensors | VOL. 27

Spatial coding-based approach for partitioning big spatial data in Hadoop
Xiaochuang Yao ... Dehai Zhu
Computers & Geosciences | VOL. 106
Xiaochuang Yao, et. al.Xiaochuang Yao ... Dehai Zhu
30 May 2017
Computers & Geosciences | VOL. 106

Incremental partitioning for efficient spatial data analytics
Tin Vu ... Vassilis Tsotras
Proceedings of the VLDB Endowment | VOL. 15
Tin Vu, et. al.Tin Vu ... Vassilis Tsotras
01 Nov 2021
Proceedings of the VLDB Endowment | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Big Data