AdaptDB

Yi Lu,Samuel Madden,Alekh Jindal,Anil Shanbhag

doi:10.14778/3055540.3055551

Abstract

Big data analytics often involves complex join queries over two or more tables. Such join processing is expensive in a distributed setting both because large amounts of data must be read from disk, and because of data shuffling across the network. Many techniques based on data partitioning have been proposed to reduce the amount of data that must be accessed, often focusing on finding the best partitioning scheme for a particular workload, rather than adapting to changes in the workload over time. In this paper, we present AdaptDB, an adaptive storage manager for analytical database workloads in a distributed setting. It works by partitioning datasets across a cluster and incrementally refining data partitioning as queries are run. AdaptDB introduces a novel hyper-join that avoids expensive data shuffling by identifying storage blocks of the joining tables that overlap on the join attribute, and only joining those blocks. Hyper-join performs well when each block in one table overlaps with few blocks in the other table, since that will minimize the number of blocks that have to be accessed. To minimize the number of overlapping blocks for common join queries, AdaptDB users smooth repartitioning to repartition small portions of the tables on join attributes as queries run. A prototype of AdaptDB running on top of Spark improves query performance by 2--3x on TPC-H as well as real-world dataset, versus a system that employs scans and shuffle-joins.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

AdaptDB

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Journal: Proceedings of the VLDB Endowment	Publication Date: Jan 1, 2017
Citations: 48

Similar Papers

A survey of data partitioning and sampling methods to support big data analysis
Mohammad Sultan Mahmud ... Kuanishbay Sadatdiynov
Big Data Mining and Analytics | VOL. 3
Mohammad Sultan Mahmud, et. al.Mohammad Sultan Mahmud ... Kuanishbay Sadatdiynov
01 Jun 2020
Big Data Mining and Analytics | VOL. 3

Taking a byte out of big data
Michael Glick
The Journal of the American Dental Association | VOL. 146
Michael GlickMichael Glick
26 Oct 2015
The Journal of the American Dental Association | VOL. 146

Evaluation of Big Data Privacy and Accuracy Issues
Reem Bashir ... Abdelhamid Abdelhadi Mansor
-
Reem Bashir, et. al.Reem Bashir ... Abdelhamid Abdelhadi Mansor
01 Jan 2015
01 Jan 2015

Contemporary Computing Technologies for Processing Big Spatiotemporal Data
Chaowei Yang ... Zhenlong Li
-
Chaowei Yang, et. al.Chaowei Yang ... Zhenlong Li
20 Jun 2014
20 Jun 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AdaptDB

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment