Abstract

Data is increasing at an enormous rate every day. Traditionally data has resided in silosacross any organization,so it’s difficult to have a complete picture for data driven business decision making. Data lake addresses the problem of rate of increase of data by providing “schema on read”, better integration and cheaper storage. It also solves the data silos problemby providing a central platform for a variety of data housing needs. However, implementing a data lake becomes challenging as the implementation needs to address the additional needs like metadata management, data discovery, data governance, data lifecycle management, security and centralized access controls mechanisms. This paper intends to provide a comprehensive architecture of data lake to address these challenges. We have also conducted and documented our experiments with publicly available datasets about COVID19 to validate the design and applicability of the proposed architecture for business analytics purposes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call