GloBiMapsAI: An AI-Enhanced Probabilistic Data Structure for Global Raster Datasets

Martin Werner

doi:10.1145/3453184

Abstract

In the last decade, more and more spatial data has been acquired on a global scale due to satellite missions, social media, and coordinated governmental activities. This observational data suffers from huge storage footprints and makes global analysis challenging. Therefore, many information products have been designed in which observations are turned into global maps showing features such as land cover or land use, often with only a few discrete values and sparse spatial coverage like only within cities. Traditional coding of such data as a raster image becomes challenging due to the sizes of the datasets and spatially non-local access patterns, for example, when labeling social media streams. This article proposes GloBiMap, a randomized data structure, based on Bloom filters, for modeling low-cardinality sparse raster images of excessive sizes in a configurable amount of memory with pure random access operations avoiding costly intermediate decompression. In addition, the data structure is designed to correct the inevitable errors of the randomized layer in order to have a fully exact representation. We show the feasibility of the approach on several real-world datasets including the Global Urban Footprint in which each pixel denotes whether a particular location contains a building at a resolution of roughly 10m globally as well as on a global Twitter sample of more than 220 million precisely geolocated tweets. In addition, we propose the integration of a denoiser engine based on artificial intelligence in order to reduce the amount of error correction information for extremely compressive GloBiMaps.

Full Text