Microlearner: A fine-grained Learning Optimizer for Big Data Workloads at Microsoft

Alekh Jindal,Rathijit Sen,Shi Qiao,Hiren Patel

doi:10.1109/icde51399.2021.00275

Abstract

Big data systems have become increasingly complex making the job of a query optimizer incredibly difficult. This is due to more complicated decision making, more complex query plans seen, and more tedious objective functions in cloud-based big data workloads. As a result, production cloud query optimizers are often far from optimal. In this paper, we describe building a learning query optimizer for big data workloads at Microsoft. We make four major contributions. First, we describe the challenges in cloud query optimizers based on our observations from the big data workloads at Microsoft. Second, we discuss what makes machine learning an attractive approach to aid the big data query optimizers in decision making. Third, we present Microlearner, a practical approach to characterize large cloud workloads into smaller subsets and build micromodels over each subset to tame the complexity of big data workloads And finally, we describe the productization of Microlearner, using learned cardinality as a concrete example, via performance results over very large production workloads and illustrating the various challenges involved in deployment.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Microlearner: A fine-grained Learning Optimizer for Big Data Workloads at Microsoft

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Characterizing and subsetting big data workloads
Zhen Jia ... Sally A Mckee
-
Zhen Jia, et. al.Zhen Jia ... Sally A Mckee
01 Oct 2014
01 Oct 2014

Big data and HPC collocation: Using HPC idle resources for Big Data analytics
Michael Mercier ... Olivier Richard
-
Michael Mercier, et. al.Michael Mercier ... Olivier Richard
13 Nov 2017
13 Nov 2017

Replica parallelism to utilize the granularity of data
Won Gi Choi ... Sanghyun Park
-
Won Gi Choi, et. al.Won Gi Choi ... Sanghyun Park
17 Oct 2016
17 Oct 2016

Scalable system scheduling for HPC and big data
Albert Reuther ... Jeremy Kepner
Journal of Parallel and Distributed Computing | VOL. 111
Albert Reuther, et. al.Albert Reuther ... Jeremy Kepner
08 Aug 2017
Journal of Parallel and Distributed Computing | VOL. 111

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Microlearner: A fine-grained Learning Optimizer for Big Data Workloads at Microsoft

Abstract

Talk to us

Similar Papers