Abstract

In the information age, data integration has become easier than ever. Enterprises integrate a wide range of data sources to enrich big data lakes. Enterprise big data lake made data consumption simpler and faster for all stakeholders. Often, stakeholders face challenges to limit data that they need for analysis and making effective decisions. As more data from ever-growing data sources is coming in, users are flooded with a variety of data. Data models alleviated the pain to serve insights to enterprise users. Data models provided insights after data cleansing, aggregating, and applying business rules. As data models in big data grow, queries and analysis require processing the large volume of data and big joins. It leads to long response and processing times. Data modelling in big data platforms needs attention to effectively cleanse, organise, and store big data to ensure timely availability of enterprise insights. As the scale is a critical aspect of the big data platform, big data should be modelled in a way that accessibility and delivery of insights should not be affected when the scale goes up. This paper presents best practices to model structured and semi-structured data in the big data platform.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call