Optimizing Timeliness and Cost in Geo-Distributed Streaming Analytics

Benjamin Heintz,Abhishek Chandra,Ramesh K Sitaraman

doi:10.1109/tcc.2017.2750678

Abstract

Rapid data streams are generated continuously from diverse sources including users, devices, and sensors located around the globe. This results in the need for efficient geo-distributed streaming analytics to extract timely information. A typical geo-distributed analytics service uses a hub-and-spoke model, comprising multiple edges connected by a wide-area-network (WAN) to a central data warehouse. In this paper, we focus on the widely used primitive of windowed grouped aggregation , and examine the question of how much computation should be performed at the edges versus the center . We develop algorithms to optimize two key metrics: WAN traffic and staleness (delay in getting results). We present a family of optimal offline algorithms that jointly minimize these metrics, and we use these to guide our design of practical online algorithms based on the insight that windowed grouped aggregation can be modeled as a caching problem where the cache size varies over time. We evaluate our algorithms through an implementation in Apache Storm deployed on PlanetLab. Using workloads derived from anonymized traces of a popular analytics service from a large commercial CDN, our experiments show that our online algorithms achieve near-optimal traffic and staleness for a variety of system configurations, stream arrival rates, and queries.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Cloud Computing	Publication Date: Jan 1, 2020
Citations: 47	License type: publisher-specific, author manuscript

R Discovery Prime

R Discovery Prime

Optimizing Timeliness and Cost in Geo-Distributed Streaming Analytics

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Cloud Computing

Lead the way for us

Similar Papers

Optimizing Grouped Aggregation in Geo-Distributed Streaming Analytics
Benjamin Heintz ... Abhishek Chandra
-
Benjamin Heintz, et. al.Benjamin Heintz ... Abhishek Chandra
15 Jun 2015
15 Jun 2015

Analytics Ecosystem Transformation: A Force for Business Model Innovation
Ying Chen ... Carl Abrams
-
Ying Chen, et. al.Ying Chen ... Carl Abrams
01 Mar 2011
01 Mar 2011

Optimal Online Algorithms for File-Bundle Caching and Generalization to Distributed Caching
Tiancheng Qin ... S Rasoul Etesami
ACM Transactions on Modeling and Performance Evaluation of Computing Systems | VOL. 6
Tiancheng Qin, et. al.Tiancheng Qin ... S Rasoul Etesami
27 Mar 2021
ACM Transactions on Modeling and Performance Evaluation of Computing Systems | VOL. 6

A Testbed for QoS-Based Data Analytic Service Selection in the Cloud
Md Shahinur Rahman
-
Md Shahinur RahmanMd Shahinur Rahman
14 Mar 2023
14 Mar 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimizing Timeliness and Cost in Geo-Distributed Streaming Analytics

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Cloud Computing