Abstract
Data lakes are storage repositories that contain large amounts of data in its native format; either structured ssemi-structured or unstructured, to be used when needed. Data lakes are open to a wide range of use cases such as carrying out advanced analytics, extracting knowledge patterns, etc. However, simply dumping all the data into a data lake would only lead to a so-called data swamp. To prevent such a situation, enterprises can adopt best practices among which to build and maintain metadata. In recent years there has been a growing body of research about managing metadata in data lake environments. Existing research efforts deal separately with different activities such as metadata modeling, metadata capture and extraction, metadata usage, etc. Nevertheless, despite its importance, a global view about the research landscape about metadata management for data lakes is still missing. This survey congregates different facets of metadata management in data lakes and presents a global view along with the technological implications and the required features for building successful metadata management systems. Besides, this survey summarizes and discusses research gaps, open problems and main challenges facing both industrialists and academics. This survey pertains to the broader field of Big Data and especially to the data platforms that manage enterprise big data assets. Furthermore, considering the parallels between data lakes and digital libraries regarding their dependence on metadata for content management, this study could offer valuable insights to the digital library community, offering them a technological outlook on metadata management.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.