Hilbert Curve and Cassandra Based Indexing and Storing Approach for Large-Scale Spatiotemporal Data

Bisong Cao ,Feng Huasen ,Liang Ji ,Xiang Li

doi:10.13203/j.whugis20200367

Abstract

Objectives Because of the fast growing acquisition of real-time spatiotemporal data for various applications such as smart city or real-time air-quality monitoring, the traditional database technologies can-not satisfy the higher standards for large-scale data indexing, querying, and storing operations. As the via-ble alternative, NoSQL databases that are scalable and possess fast input/output capabilities offer potential solutions to accommodate the needs. Methods We propose a Hilbert curve and Cassandra technologies based approach for efficient indexing and storing of large-scale spatiotemporal datasets aiming to provide an effective framework for processing, querying, and analyzing large amount of data with spatial and temporal features. For example, the dataset of vehicle trajectories contains valuable spatial and temporal features those are being employed in the real world. The collected spatiotemporal datasets are preprocessed in order to fit the proposed structures for different applications. Specifically, two types of query applications com -monly used in the real world are the spatiotemporal range query and query upon vehicle IDs respectively. Two corresponding indexing structures are designed and implemented in order to accommodate the requests. S2 Geometry Library open sourced by Google is utilized to divide the earth surface into grids, and data points fall in grids are assigned with the specific IDs as the keys. The keys and columns are so designed by applying the Hilbert curve and Cassandra techniques that the resultant structures will physically store the spatially neighboring data points close to each other, and they are more suitable for large-scale spatiotempo-ral data querying and analyzing applications. Results The datasets acquired from the real applications are used to conduct the computational experiments to validate the efficiency of the proposed approach. The que-ry efficiency and the time consumed to store large amount of spatiotemporal data are investigated and bench-marked against some existing database technologies. Conclusions The computational experiments reveal the superiority of the proposed approach comparing to the existing methodologies, the required time to store (insert) data in the database is reduced by 6 times while the time needed to query data is decreased by at least 10 times. The efficiency of the proposed methodology is validate further by applying it to query the vehicle trajectories gathering the real-time air quality data.

Full Text