A New Design of High-Performance Large-Scale GIS Computing at a Finer Spatial Granularity: A Case Study of Spatial Join with Spark for Sustainability

Feng Zhang,Renyi Liu,Xinyue Ye,Jingwei Zhou,Zhenhong Du

doi:10.3390/su8090926

Abstract

Sustainability research faces many challenges as respective environmental, urban and regional contexts are experiencing rapid changes at an unprecedented spatial granularity level, which involves growing massive data and the need for spatial relationship detection at a faster pace. Spatial join is a fundamental method for making data more informative with respect to spatial relations. The dramatic growth of data volumes has led to increased focus on high-performance large-scale spatial join. In this paper, we present Spatial Join with Spark (SJS), a proposed high-performance algorithm, that uses a simple, but efficient, uniform spatial grid to partition datasets and joins the partitions with the built-in join transformation of Spark. SJS utilizes the distributed in-memory iterative computation of Spark, then introduces a calculation-evaluating model and in-memory spatial repartition technology, which optimize the initial partition by evaluating the calculation amount of local join algorithms without any disk access. We compare four in-memory spatial join algorithms in SJS for further performance improvement. Based on extensive experiments with real-world data, we conclude that SJS outperforms the Spark and MapReduce implementations of earlier spatial join approaches. This study demonstrates that it is promising to leverage high-performance computing for large-scale spatial join analysis. The availability of large-sized geo-referenced datasets along with the high-performance computing technology can raise great opportunities for sustainability research on whether and how these new trends in data and technology can be utilized to help detect the associated trends and patterns in the human-environment dynamics.

Highlights

Sustainability research faces many challenges as respective environmental, urban and regional contexts are experiencing rapid changes at an unprecedented spatial granularity level, which involves growing massive data and the need for spatial relationship detection at a faster pace
Based on the analysis of existing Hadoop-like high-performance spatial join algorithms, we found that the key factors for improving the performance of spatial join are: (1) simplification of the spatial partitioning algorithm to reduce the preprocessing time; (2) optimization of the partition results for both CPU and memory requirements; and (3) improvement of the performance of the local join algorithm
We propose an improved in-memory spatial repartition method based on the calculation amount of local join algorithms in order to refine the partition results

Summary

Introduction

Spark Parallel Computing Framework

Spatial Join Query

Hadoop-Like Spatial Join Approaches

Methods

Calculation Evaluating Model

Spatial Repartition Phase in SJS

Experiments and Evaluation

Experiment Setup and Datasets

Findings

Impact of Number of Nodes and Executor Cores

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sustainability	Publication Date: Sep 10, 2016
Citations: 41	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A New Design of High-Performance Large-Scale GIS Computing at a Finer Spatial Granularity: A Case Study of Spatial Join with Spark for Sustainability

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sustainability

Lead the way for us

Similar Papers

Adaptive row major order: a new space filling curve for efficient spatial join processing in the transform space
Min-Jae Lee ... Il-Yeol Song
The Journal of Systems & Software | VOL. 78
Min-Jae Lee, et. al.Min-Jae Lee ... Il-Yeol Song
18 Nov 2004
The Journal of Systems & Software | VOL. 78

Short Paper : Optimized Spatial Join with Grid Sub-Partitioning
Prafullata Auradkar ... Anirudh Jakati
-
Prafullata Auradkar, et. al.Prafullata Auradkar ... Anirudh Jakati
01 Oct 2021
01 Oct 2021

SJMR: Parallelizing spatial join with MapReduce on clusters
Shubin Zhang ... Jizhong Han
-
Shubin Zhang, et. al.Shubin Zhang ... Jizhong Han
01 Jan 2009
01 Jan 2009

A Parallel Spatial Join Processing for Distributed Spatial Databases
Myoung-Soo Kang ... Kyun Koh
-
Myoung-Soo Kang, et. al.Myoung-Soo Kang ... Kyun Koh
01 Jan 2002
01 Jan 2002

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A New Design of High-Performance Large-Scale GIS Computing at a Finer Spatial Granularity: A Case Study of Spatial Join with Spark for Sustainability

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sustainability