SP-Phoenix: A Massive Spatial Point Data Management System Based on Phoenix

Longhai Li,Wendong Liu,Chengqiang Huang,Zhaoyu Zhong

doi:10.1109/hpcc/smartcity/dss.2018.00266

Abstract

The NoSQL database HBase has been widely used to build data management systems and data warehouse systems primarily due to its inherent advantages in scalability, fault tolerance, throughput and distributed processing ability. However, HBase does not provide direct support for storing and retrieving spatial data. We designed a data management system for massive spatial points called SP-Phoenix based on two open-source projects, Phoenix and HBase. SP-Phoenix is highly scalable, fault tolerant, and supports flexible access to its spatial data through an extended SQL language. By taking advantage of geohash-based spatial indexes, SP-Phoenix achieves several basic spatial query operations including rectangular range query, non-regular area query and k-Nearest-Neighbor(kNN) query which are all essential primitives for realizing complex spatial queries. SP-Phoenix employs the user-defined functions and server-side aggregating and sorting mechanisms offered by Phoenix to impose most spatial filtering tasks on the server side in query processing, effectively reducing the computing burden of the client. SP-Phoenix also leverages a query optimization method based on spatial distribution statistics, which further improves the efficiency of spatial query. Experimental evaluations show that SP-Phoenix deployed over a small scale cluster can sustain an I/O throughput of over hundreds of thousands of data insertions per second, while serving spatial range queries and kNN queries with response times as low as hundreds of milliseconds. The experiments demonstrate that SP-Phoenix is applicable to a wide spectrum of spatial position related applications which demand high insertion throughput and real time spatial queries.

Full Text