Abstract

Extraction, Transformation and Loading (ETL) is introduced as one of the notable subjects in optimization, management, improvement and acceleration of processes and operations in data bases and data warehouses. The creation of ETL processes is potentially one of the greatest tasks of data warehouses and so its production is a time-consuming and complicated procedure. Without optimization of these processes, the implementation of projects in data warehouses area is costly, complicated and time-consuming. The present paper used the combination of parallelization methods and shared cache memory in systems distributed on the basis of data warehouse. According to the conducted assessment, the proposed method exhibited 7.1% speed improvement to kattle optimization instrument and 7.9% to talend instrument in terms of implementation time of the ETL process. Therefore, parallelization could notably improve the ETL process. It eventually caused the management and integration processes of big data to be implemented in a simple way and with acceptable speed.

Highlights

  • Toloie EshlaghyAbstract—Extraction, Transformation and Loading (ETL) is introduced as one of the notable subjects in optimization, management, improvement and acceleration of processes and operations in data bases and data warehouses

  • Data warehouse applications have utilized Extraction, Transformation and Loading (ETL) processes through tools that extract data from data resources, transform them to an acceptable format and load them in a data provider [1]

  • With regard to the examination of weak and strong points of former researches, the present paper has presented a new combined method by usage of parallelization techniques and simultaneous use of multiple cores to process and manage different databases in scattered locations as well as the application of cache memory shared between cores which conduct the operations of implementation, transformation and loading of data from distributed data bases in different locations and main data warehouse located in a definite place

Read more

Summary

Toloie Eshlaghy

Abstract—Extraction, Transformation and Loading (ETL) is introduced as one of the notable subjects in optimization, management, improvement and acceleration of processes and operations in data bases and data warehouses. The creation of ETL processes is potentially one of the greatest tasks of data warehouses and so its production is a time-consuming and complicated procedure. Without optimization of these processes, the implementation of projects in data warehouses area is costly, complicated and time-consuming. Parallelization could notably improve the ETL process It eventually caused the management and integration processes of big data to be implemented in a simple way and with acceptable speed

INTRODUCTION
CONCEPTS OF ETL
Extraction phase
Transformation phase
Loading Phase
Meta Data
ARCHITECTURE AND ANALYSIS OF THE RECOMMENDED
Shared Cache Memory
EVALUATION
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call