Building a novel physical design of a distributed big data warehouse over a Hadoop cluster to enhance OLAP cube query performance

Yassine Ramdane,Omar Boussaid,Doulkifli Boukraà,Nadia Kabachi,Fadila Bentayeb

doi:10.1016/j.parco.2022.102918

Yassine Ramdane, Omar Boussaid + Show 3 more

Open Access

https://doi.org/10.1016/j.parco.2022.102918

Copy DOI

Journal: Parallel computing	Publication Date: Mar 3, 2022
Citations: 10	License type: publisher-specific-oa

Affiliation: Claude Bernard University Lyon 1, University of Jijel

Abstract

Improving OLAP (Online Analytical Processing) query performance in a distributed system on top of Hadoop is a challenging task. An OLAP Cube query comprises several relational operations, such as selection, join, and group-by aggregation. It is well-known that star join and group-by aggregation are the most costly operations in a Hadoop database system. These operations indeed increase network traffic and may overflow memory; to overcome these difficulties, numerous partitioning and data load balancing techniques have been proposed in the literature. However, some issues remain questionable, such as decreasing the Spark stages and the network I/O for an OLAP query being executed on a distributed system. In a precedent work, we proposed a novel data placement strategy for a big data warehouse over a Hadoop cluster. This data warehouse schema enhances the projection, selection, and star-join operations of an OLAP query, such that the system’s query-optimizer can perform a star join process locally, in only one spark stage without a shuffle phase. Also, the system can skip loading unnecessary data blocks when executing the predicates. In this paper, we extend our previous work with further technical details and experiments, and we propose a new dynamic approach to improve the group-by aggregation. To evaluate our approach, we conduct some experiments on a cluster with 15 nodes. Experimental results show that our method outperforms existing approaches in terms of OLAP query evaluation time.

Full Text