PI-Join: Efficiently processing join queries on massive data

Xixian Han,Donghua Yang,Jianzhong Li

doi:10.1007/s10115-011-0429-x

Abstract

The ratio of disk capacity to disk transfer rate typically increases by 10× per decade. As a result, disk is becoming slower from the view of applications because of the much larger data volume that they need to store and process. In database systems, the less the data volume that is involved in query processing, the better the performance that is achieved. Disk-based join operation is a common but time-consuming database operation, especially in an environment of massive data in which I/O cost dominates the execution time. However, current join algorithms are only suitable for moderate or small data volume. They will incur high I/O cost when performing on massive data because of multi-pass I/O operations on the joined tables and the insensitivity to join selectivity. This paper proposes PI-Join a novel disk-based join algorithm that can efficiently process join queries involving massive data. PI-Join consists of two stages: JPIPT construction stage (JCS) and result output stage (ROS). JCS performs a cache-conscious construction algorithm on join attributes which are kept in column-oriented model to obtain join positional index pair table (JPIPT) of join results faster. The obtained JPIPT is used in ROS to retrieve results in a one-pass sequential selective scan on each table. We provide the correctness proof and cost analysis of PI-Join. Our experimental results indicate that PI-Join has a significant advantage over the existing join algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

PI-Join: Efficiently processing join queries on massive data

Abstract

Talk to us

Similar Papers

More From: Knowledge and Information Systems

Lead the way for us

Journal: Knowledge and Information Systems	Publication Date: Jul 1, 2011
Citations: 45

Similar Papers

Sampling environmental acoustic recordings to determine bird species richness
Jason Wimmer ... Ian Williamson
Ecological Applications | VOL. 23
Jason Wimmer, et. al.Jason Wimmer ... Ian Williamson
01 Sep 2013
Ecological Applications | VOL. 23

Traffic Flow Prediction with Parallel Data
Yuanyuan Chen ... Yisheng Lv
-
Yuanyuan Chen, et. al.Yuanyuan Chen ... Yisheng Lv
01 Nov 2018
01 Nov 2018

Big Data Optimization for Communication Networks
Zhu Han ...
-
Zhu Han, et. al.Zhu Han ...
01 Jan 2017
01 Jan 2017

Diagnosis of Diabetic Retinopathy through Retinal Fundus Images and 3D Convolutional Neural Networks with Limited Number of Samples
Ahsan Bin Tufail ... Wali Ullah Khan
Wireless Communications and Mobile Computing | VOL. 2021
Ahsan Bin Tufail, et. al.Ahsan Bin Tufail ... Wali Ullah Khan
01 Jan 2020
Wireless Communications and Mobile Computing | VOL. 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PI-Join: Efficiently processing join queries on massive data

Abstract

Talk to us

Similar Papers

More From: Knowledge and Information Systems