Abstract

For more than 30 decades, data warehouses have been considered the only business intelligence storage system for enterprises. However, with the advent of big data, they have been modernized to support the variety and dynamics of data by adopting the data lake as a centralized data source for heterogeneous sources. Indeed, the data lake is characterized by its flexibility and performance when storing and analyzing data. However, the absence of schema on the data during ingestion increases the risk of the transformation of the data lake into a data swamp, so the use of metadata management is essential to exploit the data lake. In this paper, we will present a conceptual metadata management model for the data lake. Our solution will be based on a functional architecture of the data lake as well as on a set of features allowing the genericity of the metadata model. Furthermore, we will present a set of transformation rules, allowing us to translate our conceptual model into an owl ontology.

Highlights

  • Use The main role of the decision-making system is to help decision-makers to effectively broaden their strategic decision-making within companies

  • After we have roughly presented the different architectures of the data lake in the literature, we will be most interested in multizone architectures [8,11,12], because they are better suited to the definition of the data lake [13]

  • As part of the use of the data lake as a heterogeneous source for data warehouses, a conceptual metadata management model was presented to address the issues associated with the transformation of the data lake into a data swamp

Read more

Summary

Introduction

Use The main role of the decision-making system is to help decision-makers to effectively broaden their strategic decision-making within companies. 3) a set of functionalities that the system must ensure to manage traceability, confidentiality, quality and aggregation of data These metadata features help structure and contextualize the data stored in the data lake. From the various metadata management model works [5] [6], there are so far 8 key features used to design a good metadata management system, namely, Semantic enrichment, data polymorphism, data versioning, usage tracking, categorization, similarity links, metadata properties, multiple granularity levels. Concerning our work, these features are not sufficient in the situation where the data lake is used as a single source for the data warehouse. We summarize our work with a brief conclusion and future perspectives

Related work
The Architecture of data lake
Topology of metadata
Managing metadata in the data lake
Functional architecture of the data lake
Implementation
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call