Abstract

Online analytical processing (OLAP) is a core functionality in database systems. The performance of OLAP is crucial to make online decisions in many applications. However, it is rather costly to support OLAP on large datasets, especially big data, and the methods that compute exact answers cannot meet the high-performance requirement. To alleviate this problem, approximate query processing (AQP) has been proposed, which aims to find an approximate answer as close as to the exact answer efficiently. Existing AQP techniques can be broadly categorized into two categories. (1) Online aggregation: select samples online and use these samples to answer OLAP queries. (2) Offline synopses generation: generate synopses offline based on a-priori knowledge (e.g., data statistics or query workload) and use these synopses to answer OLAP queries. We discuss the research challenges in AQP and summarize existing techniques to address these challenges. In addition, we review how to use AQP to support other complex data types, e.g., spatial data and trajectory data, and support other applications, e.g., data visualization and data cleaning. We also introduce existing AQP systems and summarize their advantages and limitations. Lastly, we provide research challenges and opportunities of AQP. We believe that the survey can help the partitioners to understand existing AQP techniques and select appropriate methods in their applications.

Highlights

  • Online analytical processing (OLAP) is a core functionality in data management and analytics systems [33]

  • (2) Offline synopses generation: generate synopses offline based on a-priori knowledge and use these synopses to answer OLAP queries

  • With the help of previous query answers, one can know more about the distribution and infer answers of new queries based on trained statistical model

Read more

Summary

Introduction

Online analytical processing (OLAP) is a core functionality in data management and analytics systems [33]. There are several query-driven methods, including pre-computed sampling-based approximate query (PSAQ) which needs to make assumption on QCS or queries, Histogram [88], Wavelet [46], and Sketch [14] The advantages of these techniques are that the results are more accurate on skewed data, and the query processing is fast (as they do not need to on-the-fly select samples), but they have some limitations. They cannot support general queries, especially the complex nested queries. Tim Krastra focused on their newly built interactive data exploration system IDEA [66]

Online Aggregation Methods
Online Aggregation
Error Estimation
Error Estimation with Known Distribution
Error Estimation without Known Distribution
Online Aggreagtion on Multiple Tables
Online AQP in Distributed Setting
Database Learning
Approximate Hardware
Other Works
Offline Methods
Pre‐computed Samples
Histograms
Wavelets
Sketches
Materialized Views
Bounded Resources
Bounded Error and Bounded Time
AQP on Complex Data
AQP on Spatial Data
Online Spatial AQP
Offline Spatial AQP
AQP on Trajectory Data
AQP Applications
AQP on Data Cleaning
AQP on Data Visualization
AQP Systems
Distributed AQP Systems
Other AQP Database Engines
AQP Model
Approximate Data Visualization
Smarter Query Plan
Synopse Generation in Distributed Setting
Online Algorithms for Non‐Gaussian Distribution
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.