Abstract
Data analysis is one of the essential business needs of organizations to optimize performance. The data is loaded into data warehouse (DWH) using Extract, Transform and Load (ETL). Analytics is run on the DWH. The largest cost and execution time is associated with the ET part of this workflow. Recent approaches based on Hadoop, an open source Apache framework for data intensive scalable computing, provide an alternative for ET which is both cheaper and faster than commercial prevalent ETL tools. This paper presents a case study where experimental metric results have been presented in support of the claim. The reduction of cost makes it viable for small and large organizations alike and reduction in execution time makes it possible to provide online data services.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.