LocationSpark: In-memory Distributed Spatial Query Processing and Optimization.

Mingjie Tang,Mourad Ouzzani,Yongyang Yu,Ahmed R Mahmood,Qutaibah M Malluhi,Walid G Aref

doi:10.3389/fdata.2020.00030

Abstract

Due to the ubiquity of spatial data applications and the large amounts of spatial data that these applications generate and process, there is a pressing need for scalable spatial query processing. In this paper, we present new techniques for spatial query processing and optimization in an in-memory and distributed setup to address scalability. More specifically, we introduce new techniques for handling query skew that commonly happens in practice, and minimizes communication costs accordingly. We propose a distributed query scheduler that uses a new cost model to minimize the cost of spatial query processing. The scheduler generates query execution plans that minimize the effect of query skew. The query scheduler utilizes new spatial indexing techniques based on bitmap filters to forward queries to the appropriate local nodes. Each local computation node is responsible for optimizing and selecting its best local query execution plan based on the indexes and the nature of the spatial queries in that node. All the proposed spatial query processing and optimization techniques are prototyped inside Spark, a distributed memory-based computation system. Our prototype system is termed LocationSpark. The experimental study is based on real datasets and demonstrates that LocationSpark can enhance distributed spatial query processing by up to an order of magnitude over existing in-memory and distributed spatial systems.

Highlights

Spatial computing is becoming increasingly important with the proliferation of mobile devices
LOCATIONSPARK only searches for data partitions that contribute to the kNN query point based on the global and local spatial indexes and the sFilter
We present LOCATIONSPARK, a query executor, and an optimizer based on Spark to improve the query execution plan generated for spatial queries

Summary

INTRODUCTION

Spatial computing is becoming increasingly important with the proliferation of mobile devices. MapReduce-based systems allow users to run spatial queries using predefined high-level spatial operators without worrying about fault tolerance or computation distribution These systems have the following two main limitations: (1) They do not leverage the power of distributed memory, and (2) They are unable to reuse intermediate data (Zaharia, 2016). A kNN join [Figure 1 (right)] returns the k nearest-neighbors from the dataset D for each query point q ∈ Q Both spatial operators are expensive, and may incur computation skew in certain workers, greatly degrading the overall performance. Consider a large spatial dataset, with millions of points of interests (POIs), that is partitioned into different computation nodes based on the spatial distribution of the data, e.g., one data partition represents data from San Francisco, CA, and another represents data from Chicago, IL.

Data Model and Spatial Operators

Overview of In-memory Distributed Spatial Query Processing in LocationSpark

Challenges

QUERY PLAN SCHEDULER

The Cost Model

Execution Plan Generation

A Greedy Algorithm

LOCAL EXECUTION

Spatial Range Join

SPATIAL BITMAP FILTER

Binary Encoding of the sFilter

Query Processing Using the sFilter

Query-Aware Adaptivity of the sFilter

PERFORMANCE STUDY

Experimental Setup

Spatial Range Select and Join

Performance of kNN Select and Join

RELATED WORK

CONCLUSIONS

Findings

DATA AVAILABILITY STATEMENT

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Big Data	Publication Date: Oct 16, 2020
Citations: 13	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

LocationSpark: In-memory Distributed Spatial Query Processing and Optimization.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Big Data

Lead the way for us

Similar Papers

Query evaluation and optimization in deductive and object-oriented spatial databases
Wei Lu ... Jiawei Han
Information and Software Technology | VOL. 37
Wei Lu, et. al.Wei Lu ... Jiawei Han
01 Jan 1995
Information and Software Technology | VOL. 37

A query optimization trategy for implementing multi dimensional model in spatial database system
Animesh Tripathy ... Lizashree Mishra
-
Animesh Tripathy, et. al.Animesh Tripathy ... Lizashree Mishra
01 Jul 2010
01 Jul 2010

지오센서 네트워크의 다중 공간질의 최적화를 위한 공간질의처리비용 예측 알고리즘 연구
Min Soo Kim ... Ki Joune Li
Journal of Korea Spatial Information Society | VOL. 21
Min Soo Kim, et. al.Min Soo Kim ... Ki Joune Li
30 Apr 2013
Journal of Korea Spatial Information Society | VOL. 21

Haggis
Ablimit Aji ... George Teodoro
-
Ablimit Aji, et. al.Ablimit Aji ... George Teodoro
04 Nov 2014
04 Nov 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

LocationSpark: In-memory Distributed Spatial Query Processing and Optimization.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Big Data