Easy and effective parallel programmable ETL

Christian Thomsen,Torben Bach Pedersen

doi:10.1145/2064676.2064684

Abstract

Extract-Transform-Load (ETL) programs are used to load data into data warehouses (DWs). An ETL program must extract data from sources, apply different transformations to it, and use the DW to look up/insert the data. It is both time consuming to develop and to run an ETL program. It is, however, typically the case that the ETL program can exploit both task parallelism and data parallelism to run faster. This, on the other hand, makes the development time longer as it is complex to create a parallel ETL program. To remedy this situation, we propose efficient ways to parallelize typical ETL tasks and we implement these new constructs in an ETL framework. The constructs are easy to apply and do only require few modifications to an ETL program to parallelize it. They support both task and data parallelism and give the programmer different possibilities to choose from. An experimental evaluation shows that by using a little more CPU time, the (wall-clock) time to run an ETL program can be greatly reduced.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Easy and effective parallel programmable ETL

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

ETLator - a scripting ETL framework
Miran Radonic ... Igor Mekterovic
-
Miran Radonic, et. al.Miran Radonic ... Igor Mekterovic
01 May 2017
01 May 2017

Braid: integrating task and data parallelism
E.A West ... A.S Grimshaw
-
E.A West, et. al.E.A West ... A.S Grimshaw
06 Feb 1995
06 Feb 1995

Study of Meta-Data Enrichment Methods to Achieve Near Real Time ETL
N Mohammed Muddasir ... K Raghuveer
-
N Mohammed Muddasir, et. al.N Mohammed Muddasir ... K Raghuveer
05 Nov 2018
05 Nov 2018

Task parallelism and high-performance languages
I Foster
-
I FosterI Foster
01 Mar 1996
01 Mar 1996

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Easy and effective parallel programmable ETL

Abstract

Talk to us

Similar Papers