Abstract

The construction of systems supporting spatial data has experienced great enthusiasm in the past, due to the richness of this type of data and their semantics, which can be used in the decision-making process in various fields. Thus, the problem of integrating spatial data into existing databases and information systems has been addressed by creating spatial extensions to relational tables or by creating spatial data warehouses, while arranging data structures and query languages by making them more spatially-aware. With the advent of Big Data, these conventional storage and spatial representation structures are becoming increasingly outdated, and required a new organization of spatial data. Approaches based on distributed storage and data lakes have been proposed, to integrate the complexity of spatial data, with operational and analytical systems which unfortunately quickly showed their limits. Recently the concept of lakehouse was introduced in order to integrate, among other things, the notion of reliability and ACID properties to the volume of data to be managed. This new data architecture is a combination of governed and reliable Data Warehouses and flexible, scalable and cost-effective Data Lakes. In this paper, we present how traditional approaches of spatial data management in the context of spatial big data have quickly shown their limits. We present a literature overview of these approaches, and how they led to the Data LakeHouse. We detail how the Lakehouse paradigm can be used and extended for managing spatial big data, by giving the different components and best practices for building a spatial data LakeHouse architecture optimized for the storage and computing over spatial big data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call