A generic metadata management model for heterogeneous sources in a data warehouse

Lamya Oukhouya,Brahim Er-Raha,Anass El Haddadi,Hiba Asri,S Krit

doi:10.1051/e3sconf/202129701069

Lamya Oukhouya, Brahim Er-Raha + Show 3 more

Open Access

https://doi.org/10.1051/e3sconf/202129701069

Copy DOI

Abstract

For more than 30 decades, data warehouses have been considered the only business intelligence storage system for enterprises. However, with the advent of big data, they have been modernized to support the variety and dynamics of data by adopting the data lake as a centralized data source for heterogeneous sources. Indeed, the data lake is characterized by its flexibility and performance when storing and analyzing data. However, the absence of schema on the data during ingestion increases the risk of the transformation of the data lake into a data swamp, so the use of metadata management is essential to exploit the data lake. In this paper, we will present a conceptual metadata management model for the data lake. Our solution will be based on a functional architecture of the data lake as well as on a set of features allowing the genericity of the metadata model. Furthermore, we will present a set of transformation rules, allowing us to translate our conceptual model into an owl ontology.

Highlights

Use The main role of the decision-making system is to help decision-makers to effectively broaden their strategic decision-making within companies
After we have roughly presented the different architectures of the data lake in the literature, we will be most interested in multizone architectures [8,11,12], because they are better suited to the definition of the data lake [13]
As part of the use of the data lake as a heterogeneous source for data warehouses, a conceptual metadata management model was presented to address the issues associated with the transformation of the data lake into a data swamp

Summary

Introduction

Use The main role of the decision-making system is to help decision-makers to effectively broaden their strategic decision-making within companies. 3) a set of functionalities that the system must ensure to manage traceability, confidentiality, quality and aggregation of data These metadata features help structure and contextualize the data stored in the data lake. From the various metadata management model works [5] [6], there are so far 8 key features used to design a good metadata management system, namely, Semantic enrichment, data polymorphism, data versioning, usage tracking, categorization, similarity links, metadata properties, multiple granularity levels. Concerning our work, these features are not sufficient in the situation where the data lake is used as a single source for the data warehouse. We summarize our work with a brief conclusion and future perspectives

Related work

The Architecture of data lake

Topology of metadata

Managing metadata in the data lake

Functional architecture of the data lake

Implementation

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: E3S Web of Conferences	Publication Date: Jan 1, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A generic metadata management model for heterogeneous sources in a data warehouse

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: E3S Web of Conferences

Lead the way for us

Similar Papers

Journey from Data Warehouse to Data Lake
Geeta Rani ... Avinash Sharma
-
Geeta Rani, et. al.Geeta Rani ... Avinash Sharma
08 May 2024
08 May 2024

Modeling metadata in data lakes—A generic model
Rebecca Eichler ... Bernhard Mitschang
Data & Knowledge Engineering | VOL. 136
Rebecca Eichler, et. al.Rebecca Eichler ... Bernhard Mitschang
22 Sep 2021
Data & Knowledge Engineering | VOL. 136

HANDLE - A Generic Metadata Model for Data Lakes
Rebecca Eichler ... Christoph Gröger
-
Rebecca Eichler, et. al.Rebecca Eichler ... Christoph Gröger
01 Jan 2020
01 Jan 2020

Data Lakes: Trends and Perspectives
Franck Ravat ... Yan Zhao
-
Franck Ravat, et. al.Franck Ravat ... Yan Zhao
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A generic metadata management model for heterogeneous sources in a data warehouse

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: E3S Web of Conferences