ТЕОРЕТИЧНА ОСНОВА ДЛЯ СТВОРЕННЯ АРХІТЕКТУРИ REAL WORLD DATA LAKE

Markiyan Pyts,Ivanna Dronyuk

doi:10.31891/csit-2023-2-9

Abstract

Data Lakes are the methods for storing and managing large quantities of unstructured data. Modern enterprises and small businesses, regardless of their size, can use this data to derive valuable insights about their business, such as process improvements or product usage. Although this approach to extracting insights is powerful, only some studies describe the actual implementation architectures of data lakes and warehouses. The article provides a broad overview of setting up a data lake on AWS (Amazon Web Services). It covers setting up an Application Programming Interface (API) to consume data, store data, visualize data, and the ability to create data lakes across multiple AWS accounts quickly with a single Command-line Interface (CLI) command. This is useful for creating a scalable data lake or data warehouse setup that doesn’t require much manual work. We describe how such design can be done using infrastructure as a code approach to achieve this and propose AWS architecture for solving the task of compelling data storage. The article provides a diagram of the proposed architecture accompanied by a high-level description and theoretical background.

Full Text