AutoToken

Rathijit Sen,Alekh Jindal,Shi Qiao,Hiren Patel

doi:10.14778/3415478.3415554

Abstract

Right-sizing resource allocation for big-data queries, particularly in serverless environments, is critical for improving infrastructure operational efficiency, capacity availability, query performance predictability, and for reducing unnecessary wait times. In this paper, we present AutoToken --- a simple and effective predictor for estimating the peak resource usage of recurring big data queries. It uses multiple query plan identifiers to identify recurring query templates and to learn models with the goal of reducing over-allocation in future instances of those queries. AutoToken is computationally light, for both training and scoring, is easily deployable at scale, and is integrated with the Peregrine workload optimization infrastructure at Microsoft. We extensively evaluate AutoToken on SCOPE jobs from our production clusters and show that it outperforms state-of-the-art solutions for peak resource estimation. We also discuss our plans towards supporting repeatable and extensible research on resource prediction for SCOPE jobs, including describing a simulation methodology for generating arbitrary-sized datasets with similar characteristics as the production datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

AutoToken

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Journal: Proceedings of the VLDB Endowment	Publication Date: Aug 1, 2020
Citations: 17

Similar Papers

An Enhanced Query Optimization Implemented in Hadoop using Bio-Inspired Algorithm with HDFS Technique
Et Al Abhijit Banubakode
International Journal on Recent and Innovation Trends in Computing and Communication | VOL. 11
Et Al Abhijit BanubakodeEt Al Abhijit Banubakode
02 Nov 2023
International Journal on Recent and Innovation Trends in Computing and Communication | VOL. 11

Template Based Industrial Big Data Information Extraction and Query System
Jie Wang ... Yun Lin
-
Jie Wang, et. al.Jie Wang ... Yun Lin
01 Jan 2017
01 Jan 2017

Research on Big Data Storage Structure and Query Optimization
Jinhai Zhang
-
Jinhai ZhangJinhai Zhang
01 Dec 2017
01 Dec 2017

PoBery: Possibly-complete Big Data Queries with Probabilistic Data Placement and Scanning
Jie Song ... Ye Yuan
ACM/IMS Transactions on Data Science | VOL. 2
Jie Song, et. al.Jie Song ... Ye Yuan
20 Aug 2021
ACM/IMS Transactions on Data Science | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AutoToken

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment