Efficient query processing framework for big data warehouse: an almost join-free approach

Huiju Wang,Furong Li,Xuan Zhou,Zuoyan Qin,Xiongpai Qin,Shan Wang,Qing Zhu

doi:10.1007/s11704-014-4025-6

Abstract

The rapidly increasing scale of data warehouses is challenging today's data analytical technologies. A conventional data analytical platform processes data warehouse queries using a star schema -- it normalizes the data into a fact table and a number of dimension tables, and during query processing it selectively joins the tables according to users' demands. This model is space economical. However, it faces two problems when applied to big data. First, join is an expensive operation, which prohibits a parallel database or a MapReduce-based system from achieving efficiency and scalability simultaneously. Second, join operations have to be executed repeatedly, while numerous join results can actually be reused by different queries. In this paper, we propose a new query processing framework for data warehouses. It pushes the join operations partially to the pre-processing phase and partially to the post-processing phase, so that data warehouse queries can be transformed into massive parallelized filter-aggregation operations on the fact table. In contrast to the conventional query processing models, our approach is efficient, scalable and stable despite of the large number of tables involved in the join. It is especially suitable for a large-scale parallel data warehouse. Our empirical evaluation on Hadoop shows that our framework exhibits linear scalability and outperforms some existing approaches by an order of magnitude.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient query processing framework for big data warehouse: an almost join-free approach

Abstract

Talk to us

Similar Papers

More From: Frontiers of Computer Science

Lead the way for us

Journal: Frontiers of Computer Science	Publication Date: Jan 26, 2015
Citations: 14

Similar Papers

Dynamic maintenance of multidimensional range data partitioning for parallel data processing
Junping Sun ... William I Grosky
-
Junping Sun, et. al.Junping Sun ... William I Grosky
01 Nov 1998
01 Nov 1998

Adaptive use of a cluster of PCs for data warehousing applications
Amit Rudra ... Raj Gopalan
-
Amit Rudra, et. al.Amit Rudra ... Raj Gopalan
01 Mar 2000
01 Mar 2000

Model data warehouse untuk Operasional petugas pemadam kebakaran Pada dinas pemadam kebakaran provinsi DKI Jakarta
Harco Leslie Hendric Spits Warnars ... Kharis Munawar
Jurnal Ilmu Komputer | VOL. 12
Harco Leslie Hendric Spits Warnars, et. al.Harco Leslie Hendric Spits Warnars ... Kharis Munawar
09 Apr 2019
Jurnal Ilmu Komputer | VOL. 12

Dimensions based data clustering and zone maps
Mohamed Ziauddin ... Dmitry Potapov
Proceedings of the VLDB Endowment | VOL. 10
Mohamed Ziauddin, et. al.Mohamed Ziauddin ... Dmitry Potapov
01 Aug 2017
Proceedings of the VLDB Endowment | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient query processing framework for big data warehouse: an almost join-free approach

Abstract

Talk to us

Similar Papers

More From: Frontiers of Computer Science