ETLMR: A Highly Scalable Dimensional ETL Framework Based on MapReduce

Xiufeng Liu,Torben Bach Pedersen,Christian Thomsen

doi:10.1007/978-3-642-37574-3_1

Abstract

Extract-Transform-Load (ETL) flows periodically populate data warehouses (DWs) with data from different source systems. An increasing challenge for ETL flows is processing huge volumes of data quickly. MapReduce is establishing itself as the de-facto standard for large-scale data-intensive processing. However, MapReduce lacks support for high-level ETL specific constructs, resulting in low ETL programmer productivity. This paper presents a scalable dimensional ETL framework, ETLMR, based on MapReduce. ETLMR has built-in native support for operations on DW-specific constructs such as star schemas, snowflake schemas and slowly changing dimensions (SCDs). This enables ETL developers to construct scalable MapReduce-based ETL flows with very few code lines. To achieve good performance and load balancing, a number of dimension and fact processing schemes are presented, including techniques for efficiently processing different types of dimensions. The paper describes the integration of ETLMR with aMapReduce framework and evaluates its performance on large realistic data sets. The experimental results show that ETLMR achieves very good scalability and compares favourably with other MapReduce data warehousing tools.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ETLMR: A Highly Scalable Dimensional ETL Framework Based on MapReduce

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

ETLMR: A Highly Scalable Dimensional ETL Framework Based on MapReduce
Xiufeng Liu ... Christian Thomsen
-
Xiufeng Liu, et. al.Xiufeng Liu ... Christian Thomsen
01 Jan 2010
01 Jan 2010

MapReduce-based dimensional ETL made easy
Xiufeng Liu ... Christian Thomsen
Proceedings of the VLDB Endowment | VOL. 5
Xiufeng Liu, et. al.Xiufeng Liu ... Christian Thomsen
01 Aug 2012
Proceedings of the VLDB Endowment | VOL. 5

CloudETL
Xiufeng Liu ... Torben Bach Pedersen
-
Xiufeng Liu, et. al.Xiufeng Liu ... Torben Bach Pedersen
01 Jan 2014
01 Jan 2014

Sales Analysis on Garment Industry with Datawarehouse and ETL Implementation on Star Schema
Dasmond Tan ... Santo Fernandi Wijaya
Indonesian Journal of Computer Science | VOL. 13
Dasmond Tan, et. al.Dasmond Tan ... Santo Fernandi Wijaya
27 Feb 2024
Indonesian Journal of Computer Science | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ETLMR: A Highly Scalable Dimensional ETL Framework Based on MapReduce

Abstract

Talk to us

Similar Papers