Towards general-purpose representation learning of polygonal geometries

Gengchen Mai,Ling Cai,Krzysztof Janowicz,Weiwei Sun,Chiyu Jiang,Rui Zhu,Yao Xuan,Stefano Ermon,Ni Lao

doi:10.1007/s10707-022-00481-2

Abstract

Neural network representation learning for spatial data (e.g., points, polylines, polygons, and networks) is a common need for geographic artificial intelligence (GeoAI) problems. In recent years, many advancements have been made in representation learning for points, polylines, and networks, whereas little progress has been made for polygons, especially complex polygonal geometries. In this work, we focus on developing a general-purpose polygon encoding model, which can encode a polygonal geometry (with or without holes, single or multipolygons) into an embedding space. The result embeddings can be leveraged directly (or finetuned) for downstream tasks such as shape classification, spatial relation prediction, building pattern classification, cartographic building generalization, and so on. To achieve model generalizability guarantees, we identify a few desirable properties that the encoder should satisfy: loop origin invariance, trivial vertex invariance, part permutation invariance, and topology awareness. We explore two different designs for the encoder: one derives all representations in the spatial domain and can naturally capture local structures of polygons; the other leverages spectral domain representations and can easily capture global structures of polygons. For the spatial domain approach we propose ResNet1D, a 1D CNN-based polygon encoder, which uses circular padding to achieve loop origin invariance on simple polygons. For the spectral domain approach we develop NUFTspec based on Non-Uniform Fourier Transformation (NUFT), which naturally satisfies all the desired properties. We conduct experiments on two different tasks: 1) polygon shape classification based on the commonly used MNIST dataset; 2) polygon-based spatial relation prediction based on two new datasets (DBSR-46K and DBSR-cplx46K) constructed from OpenStreetMap and DBpedia. Our results show that NUFTspec and ResNet1D outperform multiple existing baselines with significant margins. While ResNet1D suffers from model performance degradation after shape-invariance geometry modifications, NUFTspec is very robust to these modifications due to the nature of the NUFT representation. NUFTspec is able to jointly consider all parts of a multipolygon and their spatial relations during prediction while ResNet1D can recognize the shape details which are sometimes important for classification. This result points to a promising research direction of combining spatial and spectral representations.

Full Text