Abstract

AbstractAn open data lake stores various forms and types of open data, and there is an increasing demand to manage raw data in tables rather than files for efficient data exploration and analysis. In this paper, we investigate the data management of open data lakes and recognize the limitations of table migration and related problems. First, open data lakes have problems of preprocessing complexity, scale limitation, and platform dependency due to the traditional data management method and open data characteristics. Second, existing studies for table migration have problems of lack of scalability, migration incompleteness, and scale limitation. In this work, we present a novel automation framework, called Demeter, which solves three problems inherent in open data lakes by expanding automation. Specifically, it supports automating catalog collection and preprocessing tasks to solve preprocessing complexity and scale limitation. It also supports platform universality for representative data platforms through the automation of catalog analysis and detailed processing logic. Demeter then solves three problems in table migration by adopting Airbyte, an open‐source ELT platform, and by enhancing automation capability with the Airbyte manager. We verify that Demeter resolves all the problems above through extensive experiments and proves its scalability and universality. In addition, significantly outperforms CKAN by Demeter up to 508.5% in automation performance, up to 207.28% in processing time, and up to 917.17% in migration performance. These results indicate that Demeter is an excellent automation framework that increases the utilization of large‐scale open data and supports reliable Internet‐scale migration.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.