The Challenge of Building Effective, Enterprise-scale Data Lakes

Awez Syed

doi:10.1145/3318464.3393816

Abstract

There has been a rapid rise in the popularity of data lakes as the data infrastructure for modern analytics and data science. The combination of cloud storage and fast, elastic processing provides an inexpensive and scalable solution for building analytical applications. While data lakes make it easy to ingest and store vast amounts of data, the ability to effectively make use of that data is still limited. This data often lacks context, doesn't meet the quality required for applications, and is not easily understandable or discoverable by users. Problems of data consistency and accuracy make it hard to derive value from data lakes and to trust the analytics based on this data. The traditional methods of manually documenting, classifying and assessing the data don't scale to the volume of cloud-based data lakes. New automated, learning-based approaches are required to discover, curate and make the data usable for a wide variety of users. In this talk, we describe the real-world implementation patterns of data lakes and give an overview of the many open challenges in deploying successful, enterprise-scale data lakes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The Challenge of Building Effective, Enterprise-scale Data Lakes

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Data Lakes: A Survey of Functions and Systems
Rihan Hai ... Matthias Jarke
IEEE Transactions on Knowledge and Data Engineering | VOL. 35
Rihan Hai, et. al.Rihan Hai ... Matthias Jarke
01 Dec 2023
IEEE Transactions on Knowledge and Data Engineering | VOL. 35

Finding Related Tables in Data Lakes for Interactive Data Science.
Yi Zhang ... Zachary G Ives
Proceedings. ACM-SIGMOD International Conference on Management of Data | VOL. 2020
Yi Zhang, et. al.Yi Zhang ... Zachary G Ives
31 May 2020
Proceedings. ACM-SIGMOD International Conference on Management of Data | VOL. 2020

DomainNet: Homograph Detection and Understanding in Data Lake Disambiguation
Aristotelis Leventidis ... Laura Di Rocco
ACM Transactions on Database Systems | VOL. 48
Aristotelis Leventidis, et. al.Aristotelis Leventidis ... Laura Di Rocco
12 Sep 2023
ACM Transactions on Database Systems | VOL. 48

Data Governance as Success Factor for Data Science
Paul Brous ... Rutger Krans
-
Paul Brous, et. al.Paul Brous ... Rutger Krans
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Challenge of Building Effective, Enterprise-scale Data Lakes

Abstract

Talk to us

Similar Papers