Random access with a distributed Bitmap Join Index for Star Joins

Jaqueline J Brito,Thiago Mosqueiro,Ricardo R Ciferri,Cristina D.A Ciferri

doi:10.1016/j.heliyon.2020.e03342

Abstract

Indices improve the performance of relational databases, especially on queries that return a small portion of the data (i.e., low-selectivity queries). Star joins are particularly expensive operations that commonly rely on indices for improved performance at scale. The development and support of index-based solutions for Star Joins are still at very early stages. To address this gap, we propose a distributed Bitmap Join Index (dBJI) and a framework-agnostic strategy to solve join predicates in linear time. For empirical analysis, we used common Hadoop technologies (e.g., HBase and Spark) to show that dBJI significantly outperforms full scan approaches by a factor between 59% and 88% in queries with low selectivity from the Star Schema Benchmark (SSB). Thus, distributed indices may significantly enhance low-selectivity query performance even in very large databases.

Highlights

The volume of data that is available changed the design and value of decision-making systems on a broad range of fields [1, 2, 3]
We propose a strategy that combines distributed indices and a twolayer architecture based on open-source frameworks to accelerate Star Join queries with low selectivity
By employing an Access Layer able to perform random access, we propose a distributed Bitmap Join Index that leverages the parallelism provided by the Processing Layer to solve Star Joins (Section 4.2)

Summary

Introduction

The volume of data that is available changed the design and value of decision-making systems on a broad range of fields [1, 2, 3]. The Bitmap Join Index is composed of bitmap arrays that represent the occurrence of attribute values from dimension tables in the tuples of the fact table [20]. A Bitmap Join Index for an attribute α from the dimension table D is a set of bitmap arrays for every distinct value of α. For every value x of the attribute α, each bitmap itα=x contains one bit for each tuple, indexed by its primary key pkf. Each of these bits represents the occurrence (1) or not (0) of the value x in the corresponding tuple of the fact table. Only tuples 2 and 9 from the fact table should be retrieved via random access

Objectives

Methods

Findings

Conclusion