Abstract

Abstract. This paper presents a new 3D point cloud classification benchmark data set with over four billion manually labelled points, meant as input for data-hungry (deep) learning methods. We also discuss first submissions to the benchmark that use deep convolutional neural networks (CNNs) as a work horse, which already show remarkable performance improvements over state-of-the-art. CNNs have become the de-facto standard for many tasks in computer vision and machine learning like semantic segmentation or object detection in images, but have no yet led to a true breakthrough for 3D point cloud labelling tasks due to lack of training data. With the massive data set presented in this paper, we aim at closing this data gap to help unleash the full potential of deep learning methods for 3D labelling tasks. Our semantic3D.net data set consists of dense point clouds acquired with static terrestrial laser scanners. It contains 8 semantic classes and covers a wide range of urban outdoor scenes: churches, streets, railroad tracks, squares, villages, soccer fields and castles. We describe our labelling interface and show that our data set provides more dense and complete point clouds with much higher overall number of labelled points compared to those already available to the research community. We further provide baseline method descriptions and comparison between methods submitted to our online system. We hope semantic3D.net will pave the way for deep learning methods in 3D point cloud labelling to learn richer, more general 3D representations, and first submissions after only a few months indicate that this might indeed be the case.

Highlights

  • Deep learning has made a spectacular comeback since the seminal paper of (Krizhevsky et al, 2012), which revives earlier work of (Fukushima, 1980, LeCun et al, 1989)

  • What makes supervised learning hard for 3D point clouds is the sheer size of millions of points per data set, and the irregular, not gridaligned, and in places very sparse structure, with strongly varying point density (Figure 1)

  • Benchmarking efforts have a long tradition in the geospatial data community and in ISPRS

Read more

Summary

INTRODUCTION

Deep learning has made a spectacular comeback since the seminal paper of (Krizhevsky et al, 2012), which revives earlier work of (Fukushima, 1980, LeCun et al, 1989). The large majority of state-of-the-art methods in computer vision and machine learning include CNNs as one of their essential components Their success for image-interpretation tasks is mainly due to (i) parallelisable network architectures that facilitate training from millions of images on a single GPU and (ii) the availability of huge public benchmark data sets like ImageNet (Deng et al, 2009, Russakovsky et al, 2015) and Pascal VOC (Everingham et al, 2010) for rgb images, or SUN rgbd (Song et al, 2015) for rgb-d data. Due to the additional dimension, the number of classifier parameters is larger in 3D space than in 2D, and specific 3D effects like occlusion or variations in point density lead to many different patterns for identical output classes This aggravates training good, general classifiers and we generally need more training data in 3D than in 2D1. First submissions have been made to the benchmark, which we briefly discuss

RELATED WORK
OBJECTIVE
Point Cloud Annotation
EVALUATION
BENCHMARK STATISTICS
CONCLUSION AND OUTLOOK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call