With the advent of IoT and emerging 5G technology, real-time streaming data are being generated at unprecedented speed and volume, and coming with both temporal and spatial dimensions. Effective analysis at such scale and speed requires support for dynamically adjusting querying capabilities in real-time. In spatio-temporal domain, this warrants for data as well as query optimization strategies especially for objects with changing motion states. Contemporary spatio-temporal data stream management systems in distributed domain are mostly dominated by specified-once-applied-continuously query model. Any modification in query state requires query restart limiting system responsiveness and producing outdated or in worst case erroneous results. In this paper, we propose adaptations of principles from streaming databases, spatial data management and distributed computing to support dynamic spatio-temporal query processing over high velocity big data streams. We first formulate a set of spatio-temporal data types and functions to seamlessly handle changes in distributed query states. We develop a comprehensive set of streaming spatio-temporal querying methods, and propose geohash based dynamic spatial partitioning for effective parallel processing. We implement a prototype on top of Apache Flink, where the in-memory stream processing fits nicely with our spatio-temporal models. Comparative evaluation of our prototype demonstrates the effectiveness our strategy by maintaining high consistent processing rates for both stationary as well as moving queries over high velocity spatio-temporal big data streams.
Read full abstract