Pre-training summarization models of structured datasets for cardinality estimation

Yao Lu,Surajit Chaudhuri,Arnd Christian König,Srikanth Kandula

doi:10.14778/3494124.3494127

Pre-training summarization models of structured datasets for cardinality estimation

Yao Lu, Surajit Chaudhuri + Show 2 more

https://doi.org/10.14778/3494124.3494127

Copy DOI

Journal: Proceedings of the VLDB Endowment	Publication Date: Nov 1, 2021
Citations: 7

Affiliation: Microsoft Research (United Kingdom)

#Pre-training Models #Cardinality Estimation + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We consider the problem of pre-training models which convert structured datasets into succinct summaries that can be used to answer cardinality estimation queries. Doing so avoids per-dataset training and, in our experiments, reduces the time to construct summaries by up to 100×. When datasets change, our summaries are incrementally updateable. Our key insights are to use multiple summaries per dataset, use learned summaries for columnsets for which other simpler techniques do not achieve high accuracy, and that analogous to similar pre-trained models for images and text, structured datasets have some common frequency and correlation patterns which our models learn to capture by pre-training on a large and diverse corpus of datasets.

Full Text