Abstract

Big data systems have become increasingly complex making the job of a query optimizer incredibly difficult. This is due to more complicated decision making, more complex query plans seen, and more tedious objective functions in cloud-based big data workloads. As a result, production cloud query optimizers are often far from optimal. In this paper, we describe building a learning query optimizer for big data workloads at Microsoft. We make four major contributions. First, we describe the challenges in cloud query optimizers based on our observations from the big data workloads at Microsoft. Second, we discuss what makes machine learning an attractive approach to aid the big data query optimizers in decision making. Third, we present Microlearner, a practical approach to characterize large cloud workloads into smaller subsets and build micromodels over each subset to tame the complexity of big data workloads And finally, we describe the productization of Microlearner, using learned cardinality as a concrete example, via performance results over very large production workloads and illustrating the various challenges involved in deployment.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.