Abstract

Abstract. Semantic interpretation of multi-modal datasets is of great importance in many domains of geospatial data analysis. However, when training models for automated semantic segmentation, labeled training data is required and in case of multi-modality for each representation form of the scene. To completely avoid the time-consuming and cost-intensive involvement of an expert in the annotation procedure, we propose an Active Learning (AL) pipeline where a Random Forest classifier selects a subset of points sufficient for training and where necessary labels are received from the crowd. In this AL loop, we aim on coupled semantic segmentation of an Airborne Laser Scanning (ALS) point cloud and the corresponding 3D textured mesh generated from LiDAR data and imagery in a hybrid manner. Within this work we pursue two main objectives: i) We evaluate the performance of the AL pipeline applied to an ultra-high resolution ALS point cloud and a derived textured mesh (both benchmark datasets are available at https://ifpwww.ifp.uni-stuttgart.de/benchmark/hessigheim/default.aspx). ii) We investigate the capabilities of the crowd regarding interpretation of 3D geodata and observed that the crowd performs about 3 percentage points better when labeling meshes compared to point clouds. We additionally demonstrate that labels received solely by the crowd can power a machine learning system only differing in Overall Accuracy by less than 2 percentage points for the point cloud and less than 3 percentage points for the mesh, compared to using the completely labeled training pool. For deriving this sparse training set, we ask the crowd to label 0.25 % of available training points, resulting in costs of 190 $.

Highlights

  • In recent years, significant effort was put into developing and advancing automatic Machine Learning (ML) methods such as Convolutional Neural Networks (CNNs) for various data representations, as for 2D imagery (Ronneberger et al, 2015; Badrinarayanan et al, 2017) or 3D point clouds (Qi et al, 2017; Graham et al, 2018)

  • We aim to evaluate whether we can ease interpretability of sampled points by further applying the method proposed in Kolle et al (2021), denoted as Reducing Interpretation Uncertainty (RIU)

  • We first discuss the conducted experiments relying on real crowdworkers and some details regarding the crowd campaigns, which run in parallel to our Active Learning (AL) loops

Read more

Summary

Introduction

Significant effort was put into developing and advancing automatic Machine Learning (ML) methods such as Convolutional Neural Networks (CNNs) for various data representations, as for 2D imagery (Ronneberger et al, 2015; Badrinarayanan et al, 2017) or 3D point clouds (Qi et al, 2017; Graham et al, 2018). Tremendous exertion was made for establishing massive annotated data corpera such as ImageNet (Deng et al, 2009). Since manual annotation of about 14 million images by experts is infeasible, this dataset was mainly built up by the available workforce of individual crowdworkers on the internet. Compared to annotating images of everyday life scenes, interpretation of geospatial data by non-experts (i.e., the crowd) is far more demanding due to an unfamiliar perspective (i.e., nadir-like bird view). This complexity is further intensified when focusing on 3D data, which non-experts might have never dealt with before. When a semantic segmentation of 3D data is desired, working directly with the original data is most reasonable in order to avoid loss of information, for instance, by projection to a lower dimensional space. Herfort et al (2018), Walter and Soergel (2018)

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call