Abstract

Extraction, transformation, and loading (ETL) processes are used to extract data from internal and external sources of an organization, transform these data, and load them into a data warehouse. Since ETL processes are complex and costly, it is important to reduce their development and maintenance costs. Modeling ETL processes at a conceptual level is a way to achieve this goal. However, existing ETL tools, like Microsoft Integration Services or Pentaho Data Integration (also known as Kettle), have their own specific language to define ETL processes. Further, there is no agreed-upon conceptual model to specify such processes. In this chapter, we study the design of ETL processes using a conceptual approach. The model we use is based on the Business Process Modeling Notation (BPMN), a de facto standard for specifying business processes. The model provides a set of primitives that cover the requirements of frequently used ETL processes. Since BPMN is already used for specifying business processes, users already familiar with BPMN do not need to learn another language for defining ETL processes. Further, BPMN provides a conceptual and implementation-independent specification of such processes, which hides technical details and allows users and designers to focus on essential characteristics of such processes. Finally, ETL processes expressed in BPMN can be translated into executable specifications for ETL tools.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call