Mitigating Bottlenecks in Wide Area Data Analytics via Machine Learning

Hao Wang,Baochun Li

doi:10.1109/tnse.2018.2816951

Abstract

Over the past decade, we have witnessed exponential growth in the density (petabyte-level) and breadth (across geo-distributed datacenters) of data distribution. It becomes increasingly challenging but imperative to minimize the response times of data analytic queries over multiple geo-distributed datacenters. However, existing scheduling-based solutions have largely been motivated by pre-established mantras (e.g., bandwidth scarcity). Without data-driven insights into performance bottlenecks <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">at runtime</i> , schedulers might blindly assign tasks to workers that are suffering from unidentified bottlenecks. In this paper, we present <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Lube</i> , a system framework that minimizes query response times by detecting and mitigating bottlenecks at runtime. Lube monitors geo-distributed data analytic queries in real-time, detects potential bottlenecks, and mitigates them with a bottleneck-aware scheduling policy. Our preliminary experiments on a real-world prototype across Amazon EC2 regions have shown that Lube can detect bottlenecks with over 90 percent accuracy, and reduce the median query response time by up to 33 percent compared to Spark's built-in locality-based scheduler.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Mitigating Bottlenecks in Wide Area Data Analytics via Machine Learning

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Network Science and Engineering

Lead the way for us

Journal: IEEE Transactions on Network Science and Engineering	Publication Date: Jan 1, 2020
Citations: 41

Similar Papers

MATERIALIZED VIEWS QUANTUM OPTIMIZED PICKING for INDEPENDENT DATA MARTS QUALITY
Refed Adnan ... Talib M J Abbas
Iraqi Journal of Information & Communications Technology | VOL. 3
Refed Adnan, et. al.Refed Adnan ... Talib M J Abbas
11 Apr 2020
Iraqi Journal of Information & Communications Technology | VOL. 3

Human behavior based particle swarm optimization for materialized view selection in data warehousing environment

Periodicals of Engineering and Natural Sciences (PEN) | VOL. 8

06 Dec 2020
Periodicals of Engineering and Natural Sciences (PEN) | VOL. 8

Scheduling for fast response multi-pattern matching over streaming events
Ying Yan ... Ming-Chien Shan
-
Ying Yan, et. al.Ying Yan ... Ming-Chien Shan
01 Jan 2009
01 Jan 2009

Abstract 148: A Machine Learning-based Dispatch Rule for Drone-delivered Defibrillators
Jamal Chu ... K.H Benjamin Leung
Circulation | VOL. 142
Jamal Chu, et. al.Jamal Chu ... K.H Benjamin Leung
17 Nov 2020
Circulation | VOL. 142

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mitigating Bottlenecks in Wide Area Data Analytics via Machine Learning

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Network Science and Engineering