Abstract

The era of big data is evolving with the introduction of the data lake concept. While a data warehouse provides a well-structured model to manage big data, a data lake accepts data of any types and formats with or without schema and provides access to the data for diverse communities of users. A data lake provides flexible, agile, and scalable solution to manage the ever-increasing volume of big data we are witnessing in the world today, including many siloed data collected over the years by researchers through Arctic expeditions. In this paper, we present our conceptual model of a data lake for integrating the diverse huge amount of data collected by researchers during Arctic expedition. We also design a baseline metadata using a data-driven approach to manage the disparately huge structured, semi-structured, and unstructured data collected from the Arctic region. The resulting open data lake not only effectively manages big Arctic data but also supports machine learning on these big data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call