Abstract

Abstract. While most research in automatic semantic segmentation of 3D geospatial point clouds is concerned with enhancing respective Machine Learning (ML) models, we aim to shift the focus to be more of a data-centric nature. This means, we consider the creation of respective data sets that ML models learn from as key component, since even the most sophisticated model performs poorly when learning from suboptimal data. In this regard, the straightforward approach of providing labeled data abundantly is prohibitively expensive and just not scalable in times of high-frequency data acquistion cycles, where a dedicated training set should be available for each new epoch, as ML models often lack generalizability. As a remedy, we rely on Active Learning (AL), which is a cost-efficient and quick method to generate required training data at scale. Although AL has been (scarcely) applied in the geospatial domain before, a comprehensive evaluation of its capabilities, including benchmarking of achievable accuracies is lacking. Therefore, we apply the AL concept to both ISPRS’ current point cloud benchmark data sets as well as to a third large scale National Mapping Agency point cloud. Respective experiments are conducted with both a feature-driven Random Forest classifcation approach and a data-driven Submanifold Sparse Convolutional Neural Network classifier. Our experiments verify that by labeling only a fraction of available training points (typically ⪡ 1%), we can still reach accuracies that are at maximum only about 5 percentage points worse compared to leading benchmark contributions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call