Abstract

Large amounts of georeferenced data streams arrive daily to stream processing systems. This is attributable to the overabundance of affordable IoT devices. In addition, interested practitioners desire to exploit Internet of Things (IoT) data streams for strategic decision-making purposes. However, mobility data are highly skewed and their arrival rates fluctuate. This nature poses an extra challenge on data stream processing systems, which are required in order to achieve pre-specified latency and accuracy goals. In this paper, we propose ApproxSSPS, which is a system for approximate processing of geo-referenced mobility data, at scale with quality of service guarantees. We focus on stateful aggregations (e.g., means, counts) and top-N queries. ApproxSSPS features a controller that interactively learns the latency statistics and calculates proper sampling rates to meet latency or/and accuracy targets. An overarching trait of ApproxSSPS is its ability to strike a plausible balance between latency and accuracy targets. We evaluate ApproxSSPS on Apache Spark Structured Streaming with real mobility data. We also compared ApproxSSPS against a state-of-the-art online adaptive processing system. Our extensive experiments prove that ApproxSSPS can fulfill latency and accuracy targets with varying sets of parameter configurations and load intensities (i.e., transient peaks in data loads versus slow arriving streams). Moreover, our results show that ApproxSSPS outperforms the baseline counterpart by significant magnitudes. In short, ApproxSSPS is a novel spatial data stream processing system that can deliver real accurate results in a timely manner, by dynamically specifying the limits on data samples.

Highlights

  • Large amounts of geo-referenced data streams are generated daily from Internet of Things (IoT) devices in high-traffic dynamic smart cities [1]

  • Thereafter, we show the efficiency of ApproxSSPS in terms of its ability to achieve the latency and/or accuracy quality of service (QoS) goals

  • We showed the effectiveness of the latency controller of ApproxSSPS in fulfilling latency QoS goals while preserving the stability of the system during transient spikes in data arrival rates

Read more

Summary

Introduction

Large amounts of geo-referenced data streams are generated daily from IoT devices in high-traffic dynamic smart cities [1]. Meteorological data are joined with vehicle mobility data in metropolitan cities, so that municipalities can distinguish areas with more vehicle-causing air-pollutants, such as particulate matters (PM2.5 and PM10). Such dynamic smart city applications are only made possible because of the abundance of various technologies that operate synergistically to achieve such applications, including Cloud, Edge, and Fog computing, working on big data coming from the Internet of Things (IoT) [3]. Various smart cities have been designed worldwide; few remain sustainable This is so because the size and variety of data are changing at a pace that far exceeds pre-planned IT infrastructure and capacity

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call