Abstract

In stream processing, data is streamed as a continuous flow of data items, which are generated from multiple sources and geographical locations. The common approach for stream processing is to transfer raw data streams to a central data center that entails communication over the wide-area network (WAN). However, this approach is inefficient and falls short for two main reasons: i) the burst in the amount of data generated at the network edge by an increasing number of connected devices, ii) the emergence of applications with predictable and low latency requirements. In this paper, we propose SpanEdge, a novel approach that unifies stream processing across a geo-distributed infrastructure, including the central and near-the-edge data centers. SpanEdge reduces or eliminates the latency incurred by WAN links by distributing stream processing applications across the central and the near-the-edge data centers. Furthermore, SpanEdge provides a programming environment, which allows programmers to specify parts of their applications that need to be close to the data source. Programmers can develop a stream processing application, regardless of the number of data sources and their geographical distributions. As a proof of concept, we implemented and evaluated a prototype of SpanEdge. Our results show that SpanEdge can optimally deploy the stream processing applications in a geo-distributed infrastructure, which significantly reduces the bandwidth consumption and the response latency.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call