EntropyDB: a probabilistic approach to approximate query processing

Laurel Orr,Magdalena Balazinska,Dan Suciu

doi:10.1007/s00778-019-00582-9

Abstract

We present, an interactive data exploration system that uses a probabilistic approach to generate a small, query-able summary of a dataset. Departing from traditional summarization techniques, we use the Principle of Maximum Entropy to generate a probabilistic representation of the data that can be used to give approximate query answers. We develop the theoretical framework and formulation of our probabilistic representation and show how to use it to answer queries. We then present solving techniques, give two critical optimizations to improve preprocessing time and query execution time, and explore methods to reduce query error. Lastly, we experimentally evaluate our work using a 5 GB dataset of flights within the USA and a 210 GB dataset from an astronomy particle simulation. While our current work only supports linear queries, we show that our technique can successfully answer queries faster than sampling while introducing, on average, no more error than sampling and can better distinguish between rare and nonexistent values. We also discuss extensions that can allow for data updates and linear queries over joins.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

EntropyDB: a probabilistic approach to approximate query processing

Abstract

Talk to us

Similar Papers

More From: The VLDB Journal

Lead the way for us

Journal: The VLDB Journal	Publication Date: Nov 2, 2019
Citations: 7

Similar Papers

Probabilistic database summarization for interactive data exploration
Laurel Orr ... Dan Suciu
Proceedings of the VLDB Endowment | VOL. 10
Laurel Orr, et. al.Laurel Orr ... Dan Suciu
01 Jun 2017
Proceedings of the VLDB Endowment | VOL. 10

DAG-CPM Scheduler for Parallel Execution of Critical Jobs
D C V ... G T R
International Journal of Engineering and Advanced Technology | VOL. 8
D C V, et. al. D C V ... G T R
30 Aug 2019
International Journal of Engineering and Advanced Technology | VOL. 8

Dynamic sample selection for approximate query processing
Brian Babcock ... Surajit Chaudhuri
-
Brian Babcock, et. al.Brian Babcock ... Surajit Chaudhuri
09 Jun 2003
09 Jun 2003

Approximate Query Processing for Interactive Data Science
Tim Kraska
-
Tim KraskaTim Kraska
09 May 2017
09 May 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

EntropyDB: a probabilistic approach to approximate query processing

Abstract

Talk to us

Similar Papers

More From: The VLDB Journal