ShadowAQP: Efficient Approximate Group-by and Join Query via Attribute-Oriented Sample Size Allocation and Data Generation

Haoran Cai,Jiaqi Zheng,Han Li,Meng Li,Rong Gu,Guihai Chen,Haipeng Dai,Yihua Huang,Wenjie Huang,Jie Xue

doi:10.14778/3625054.3625059

Abstract

Approximate query processing (AQP) is one of the key techniques to cope with big data querying problem on account that it obtains approximate answers efficiently. To address non-trivial sample selection and heavy sampling cost issues in AQP, we propose ShadowAQP, an efficient and accurate approach based on attribute-oriented sample size allocation and data generation. We select samples according to group-by and join attributes, and determine the sample size for each group of unique value combinations to improve query accuracy. We design a conditional variational autoencoder model with automatic table data encoding and model update strategies. To further improve accuracy and efficiency, we propose a set of extensions, including parallel multi-round sampling aggregation, data outlier-aware sampling, and dimension reduction optimization. Evaluation results on diversified datasets show that, compared with SOTA approaches, ShadowAQP achieves 5.8× query speed performance improvement on average (up to 12.8×), while reducing query error by 74% on average (up to 95%) at the same time.

Full Text