Managing Of Meta Data in Data Lake for Data Profiling

Mr Fasi Ahmed Parvez Mohammad, Dr Manish Varshney

doi:10.52783/tjjpt.v44.i4.838

Abstract

Big data is stored in vast raw data stores called Data Lakes (DL). To make the data useful by its customers and to uncover the connections tying its content together, these BD necessitate new techniques of data integration and schema alignment. Metadata services that find and describe their material can offer this. A systematic method for such metadata discovery and administration does not yet exist, though. As a result, we offer a methodology that we refer to as information profiling for the profiling of informative content that is stored in the DL. To aid with data analysis, the profiles are saved as metadata. We explicitly design a metadata management method that outlines the essential tasks need to handle this properly. Using a prototype implementation handling a real-world case study from the OpenML DL, we show the effectiveness and viability of our method as well as other methodologies.

Full Text