Pando: Enhanced Data Skipping with Logical Data Partitioning

Sivaprasad Sudhir,Cyrille Habis,Nikolay Laptev,Michael Cafarella,Wenbo Tao,Samuel Madden

doi:10.14778/3598581.3598601

Abstract

With enormous volumes of data, quickly retrieving data that is relevant to a query is essential for achieving high performance. Modern cloud-based database systems often partition the data into blocks and employ various techniques to skip irrelevant blocks during query execution. Several algorithms, often based on historical properties of a workload of queries run over the data, have been proposed to tune the physical layout of data to reduce the number of blocks accessed. The effectiveness of these methods at skipping blocks depends on what metadata is stored and how well the physical data layout aligns with the queries. Existing work on automatic physical database design misses significant opportunities in skipping blocks because it ignores logical predicates in the workload that exhibit strongly correlated results. In this paper, we present Pando which enables significantly better block skipping than past methods by informing physical layout decisions with correlation-aware logical partitioning. Across a range of benchmark and real-world workloads, Pando attains up to 2.8X reduction in the number of blocks scanned and up to 2.3X speedup in end-to-end query execution time over the state-of-the-art techniques.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Pando: Enhanced Data Skipping with Logical Data Partitioning

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Journal: Proceedings of the VLDB Endowment	Publication Date: May 1, 2023
Citations: 4

Similar Papers

Instance-Optimized Data Layouts for Cloud Analytics Workloads
Jialin Ding ... Johannes Gehrke
-
Jialin Ding, et. al.Jialin Ding ... Johannes Gehrke
09 Jun 2021
09 Jun 2021

Robustness in automatic physical database design
Kareem El Gebaly ... Ashraf Aboulnaga
-
Kareem El Gebaly, et. al.Kareem El Gebaly ... Ashraf Aboulnaga
25 Mar 2008
25 Mar 2008

ICAS: an Extensible Framework for Estimating the Susceptibility of IC Layouts to Additive Trojans
Timothy Trippel ... Kevin B Bush
-
Timothy Trippel, et. al.Timothy Trippel ... Kevin B Bush
01 May 2020
01 May 2020

Flexs – A Logical Model for Physical Data Layout
Hannes Voigt ... Wolfgang Lehner
-
Hannes Voigt, et. al.Hannes Voigt ... Wolfgang Lehner
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pando: Enhanced Data Skipping with Logical Data Partitioning

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment