Abstract

Big data is stored in vast raw data stores called Data Lakes (DL). To make the data useful by its customers and to uncover the connections tying its content together, these BD necessitate new techniques of data integration and schema alignment. Metadata services that find and describe their material can offer this. A systematic method for such metadata discovery and administration does not yet exist, though. As a result, we offer a methodology that we refer to as information profiling for the profiling of informative content that is stored in the DL. To aid with data analysis, the profiles are saved as metadata. We explicitly design a metadata management method that outlines the essential tasks need to handle this properly. Using a prototype implementation handling a real-world case study from the OpenML DL, we show the effectiveness and viability of our method as well as other methodologies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call