Comparing synopsis techniques for approximate spatial data analysis

A B Siddique,Vagelis Hristidis,Ahmed Eldawy

doi:10.14778/3342263.3342635

A B Siddique, Vagelis Hristidis + Show 1 more

Open Access

https://doi.org/10.14778/3342263.3342635

Copy DOI

Journal: Proceedings of the VLDB Endowment	Publication Date: Jul 1, 2019
Citations: 14	License type: cc-by-nc-nd

Affiliation: California Coast University

Abstract

The increasing amount of spatial data calls for new scalable query processing techniques. One of the techniques that are getting attention is data synopsis , which summarizes the data using samples or histograms and computes an approximate answer based on the synopsis. This general technique is used in selectivity estimation, clustering, partitioning, load balancing, and visualization, among others. This paper experimentally studies four spatial data synopsis techniques for three common data analysis problems, namely, selectivity estimation, k-means clustering, and spatial partitioning. We run an extensive experimental evaluation on both real and synthetic datasets of up to 2.7 billion records to study the trade-offs between the synopsis methods and their applicability in big spatial data analysis. For each of the three problems, we compare with baseline techniques that operate on the whole dataset and evaluate the synopsis generation time, the time for computing an approximate answer on the synopsis, and the accuracy of the result. We present our observations about when each synopsis technique performs best.

Full Text