Cruncher: Distributed in-memory processing for location-based services

Ahmed S Abdelhamid,Ahmed M Aly,Saleh Basalamah,Mingjie Tang,Thamir Qadah,Ahmed R Mahmood,Walid G Aref

doi:10.1109/icde.2016.7498356

Abstract

Advances in location-based services (LBS) demand high-throughput processing of both static and streaming data. Recently, many systems have been introduced to support distributed main-memory processing to maximize the query throughput. However, these systems are not optimized for spatial data processing. In this demonstration, we showcase Cruncher, a distributed main-memory spatial data warehouse and streaming system. Cruncher extends Spark with adaptive query processing techniques for spatial data. Cruncher uses dynamic batch processing to distribute the queries and the data streams over commodity hardware according to an adaptive partitioning scheme. The batching technique also groups and orders the overlapping spatial queries to enable inter-query optimization. Both the data streams and the offline data share the same partitioning strategy that allows for data co-locality optimization. Furthermore, Cruncher uses an adaptive caching strategy to maintain the frequently-used location data in main memory. Cruncher maintains operational statistics to optimize query processing, data partitioning, and caching at runtime. We demonstrate two LBS applications over Cruncher using real datasets from OpenStreetMap and two synthetic data streams. We demonstrate that Cruncher achieves order(s) of magnitude throughput improvement over Spark when processing spatial data.

Full Text