Tripod: Harmonizing Job Scheduling and Data Caching for Analytics Frameworks

Yulai Tong,Hua Wang,Cheng Wang,Jiazhen Liu,Ke Zhou

doi:10.1109/iccd56317.2022.00095

Abstract

Modern data analytics platforms are often coupled with external data storage services such as Amazon S3, resulting in storage bottlenecks. Existing caching and prefetching solutions use higher-level information from data analytics frameworks, such as job dependency graphs(e.g., DAGs) and historical run time information, to predict future data accesses and then prefetch data into the cache and manage the cache contents based on those predictions.However, in doing so, they are not taking advantage of a fundamental opportunity: rather than caching data given a prediction of job execution, we can actually influence the job execution order to enable more effective caching and prefetching. With this key insight, we devise a set of novel heuristics and then design a system Tripod, which harmonizes job scheduling and data caching for analytics frameworks. With the higher-level information from analytics frameworks, Tripod explores a best-suited job execution order for prefetching and caching guided by the devised heuristics.We have implemented Tripod as extensions to Apache YARN and Tez. Our evaluation using standard analytic benchmarks (TPC-H and TPC-DS) shows that Tripod achieves up to 1.7x speedup over state-of-the-art approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Tripod: Harmonizing Job Scheduling and Data Caching for Analytics Frameworks

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

DAG-aware harmonizing job scheduling and data caching for disaggregated analytics frameworks
Yulai Tong ... Cheng Wang
Future Generation Computer Systems | VOL. 156
Yulai Tong, et. al.Yulai Tong ... Cheng Wang
05 Mar 2024
Future Generation Computer Systems | VOL. 156

Development of Novel Big Data Analytics Framework for Smart Clothing
Mominul Ahsan ... Siew Teay Hon
IEEE Access | VOL. 8
Mominul Ahsan, et. al.Mominul Ahsan ... Siew Teay Hon
01 Jan 2020
IEEE Access | VOL. 8

Too Big to Eat: Boosting Analytics Data Ingestion from Object Stores with Scoop
Yosef Moatti ... Raul Gracia-Tinedo
-
Yosef Moatti, et. al.Yosef Moatti ... Raul Gracia-Tinedo
01 Apr 2017
01 Apr 2017

Developing a goal-driven data integration framework for effective data analytics
Dapeng Liu ... Victoria Y Yoon
Decision Support Systems | VOL. 180
Dapeng Liu, et. al.Dapeng Liu ... Victoria Y Yoon
23 Feb 2024
Decision Support Systems | VOL. 180

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Tripod: Harmonizing Job Scheduling and Data Caching for Analytics Frameworks

Abstract

Talk to us

Similar Papers