Approximate Query Processing: What is New and Where to Go?

Kaiyu Li,Guoliang Li

doi:10.1007/s41019-018-0074-4

Kaiyu Li, Guoliang Li

Open Access

https://doi.org/10.1007/s41019-018-0074-4

Copy DOI

Journal: Data Science and Engineering	Publication Date: Sep 14, 2018
Citations: 108	License type: open-access

Affiliation: Tsinghua University

Abstract

Online analytical processing (OLAP) is a core functionality in database systems. The performance of OLAP is crucial to make online decisions in many applications. However, it is rather costly to support OLAP on large datasets, especially big data, and the methods that compute exact answers cannot meet the high-performance requirement. To alleviate this problem, approximate query processing (AQP) has been proposed, which aims to find an approximate answer as close as to the exact answer efficiently. Existing AQP techniques can be broadly categorized into two categories. (1) Online aggregation: select samples online and use these samples to answer OLAP queries. (2) Offline synopses generation: generate synopses offline based on a-priori knowledge (e.g., data statistics or query workload) and use these synopses to answer OLAP queries. We discuss the research challenges in AQP and summarize existing techniques to address these challenges. In addition, we review how to use AQP to support other complex data types, e.g., spatial data and trajectory data, and support other applications, e.g., data visualization and data cleaning. We also introduce existing AQP systems and summarize their advantages and limitations. Lastly, we provide research challenges and opportunities of AQP. We believe that the survey can help the partitioners to understand existing AQP techniques and select appropriate methods in their applications.

Highlights

Online analytical processing (OLAP) is a core functionality in data management and analytics systems [33]
(2) Offline synopses generation: generate synopses offline based on a-priori knowledge and use these synopses to answer OLAP queries
With the help of previous query answers, one can know more about the distribution and infer answers of new queries based on trained statistical model

Summary

Introduction

Online analytical processing (OLAP) is a core functionality in data management and analytics systems [33]. There are several query-driven methods, including pre-computed sampling-based approximate query (PSAQ) which needs to make assumption on QCS or queries, Histogram [88], Wavelet [46], and Sketch [14] The advantages of these techniques are that the results are more accurate on skewed data, and the query processing is fast (as they do not need to on-the-fly select samples), but they have some limitations. They cannot support general queries, especially the complex nested queries. Tim Krastra focused on their newly built interactive data exploration system IDEA [66]

Online Aggregation Methods

Online Aggregation

Error Estimation

Error Estimation with Known Distribution

Error Estimation without Known Distribution

Online Aggreagtion on Multiple Tables

Online AQP in Distributed Setting

Database Learning

Approximate Hardware

Other Works

Offline Methods

Pre‐computed Samples

Histograms

Wavelets

Sketches

Materialized Views

Bounded Resources

Bounded Error and Bounded Time

AQP on Complex Data

AQP on Spatial Data

Online Spatial AQP

Offline Spatial AQP

AQP on Trajectory Data

AQP Applications

AQP on Data Cleaning

AQP on Data Visualization

AQP Systems

Distributed AQP Systems

Other AQP Database Engines

AQP Model

Approximate Data Visualization

Smarter Query Plan

Synopse Generation in Distributed Setting

Online Algorithms for Non‐Gaussian Distribution

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Approximate Query Processing: What is New and Where to Go?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Science and Engineering

Lead the way for us

Similar Papers

A generic data model and query language for spatiotemporal OLAP cube analysis
Leticia I Gómez ... Alejandro A Vaisman
-
Leticia I Gómez, et. al.Leticia I Gómez ... Alejandro A Vaisman
27 Mar 2012
27 Mar 2012

HaoLap: A Hadoop based OLAP system for big data
Jie Song ... Jean-Marc Pierson
Journal of Systems and Software | VOL. 102
Jie Song, et. al.Jie Song ... Jean-Marc Pierson
30 Sep 2014
Journal of Systems and Software | VOL. 102

Improving performance by creating a native join-index for OLAP
Yansong Zhang ... Jiaheng Lu
Frontiers of Computer Science in China | VOL. 5
Yansong Zhang, et. al.Yansong Zhang ... Jiaheng Lu
16 Feb 2011
Frontiers of Computer Science in China | VOL. 5

A Sampling-Based Hybrid Approximate Query Processing System in the Cloud
Yuxiang Wang ... Junzhou Luo
-
Yuxiang Wang, et. al.Yuxiang Wang ... Junzhou Luo
01 Sep 2014
01 Sep 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Approximate Query Processing: What is New and Where to Go?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Science and Engineering