Abstract

Sampling-based approximate query processing (AQP) method provides a fast way, in which the users can obtain a trade-off between accuracy and time consumption by executing the analytical application on a sample of data rather than the whole dataset. AQP method is usually adopted to support Big Data analysis efficiently, and there are two major AQP methods: (1) central limit theorem (CLT) based online aggregation, and (2) bootstrap method. The first one is time-efficient but only available for simple aggregation queries, while the second one is general but high computation overhead. Both two methods suffer from the possible estimation failure. However, there is no technology can not only supports more categories of queries but also has acceptable execution time. In order to make the current AQP method much more general and efficient, we propose a hybrid approximate query framework called AQP++ to combine the advantages of both methods together and eliminate the limitations as far as possible. We have implemented our AQP++ prototype and conducted extensive experiments on the TPC-H benchmark. Our results demonstrate the effectiveness and efficiency of our AQP++.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call