Abstract

Semantic segmentation is used to enable a computer to understand its surrounding environment. In image processing, images are partitioned into segments for this purpose. State-of-the-art methods make use of Convolutional Neural Networks to segment a 2D image. Compared to that, 3D approaches suffer from computational cost and are not applicable without any further steps. In this work, we focus on semantic segmentation based on 3D point clouds. We use the idea to project the 3D data into a 2D image to accelerate the segmentation process. Afterward, the processed image gets re-projected to receive the desired result. We investigate different projection views and compare them to clarify their strengths and weaknesses. To compensate for projection errors and the loss of geometrical information, we evolve the approach and show how to fuse different views. We have decided to fuse the bird’s-eye and the spherical projection as each of them achieves reasonable results, and the two perspectives complement each other best. For training and evaluation, we use the real-world datasets SemanticKITTI. Further, we use the ParisLille and synthetic data generated by the simulation framework Carla to analyze the approaches in more detail and clarify their strengths and weaknesses. Although these methods achieve reasonable and competitive results, they lack flexibility. They depend on the sensor used and the setup in which the sensor is used.

Highlights

  • With an increasing number of 3D sensors available, such as Light Detection and Ranging (LiDAR), the demand for 3D data processing is increasing

  • All blocks with K = 1 that are followed by another convolution reduce the amount of feature channels by a factor of 8 compared to the desired overall output channels

  • Most of the work published so far uses Nearest Neighbor (NN) methods to deal with this error

Read more

Summary

Introduction

With an increasing number of 3D sensors available, such as Light Detection and Ranging (LiDAR), the demand for 3D data processing is increasing. The task of understanding a scene is challenging for a computer It can be divided into classification, object recognition, semantic, and instance segmentation. Instead of working with a single view, we examine the spherical, the bird’s eye, and the cylindrical views Stating their advantages and disadvantages allows us to compare the views with each other and shows how to improve them. This does mean to apply the approaches to unseen data and to a new sensor with a different setup, as well as to synthetic generated data. Comparing different projection-based methods with each other to highlight the advantages and disadvantages improving the performance for the regular bird’s eye and cylindrical view proposing methods that can be used to fuse multiple projections with each other to improve the overall performance

Related Work
Spherical View
Bird’s-Eye View
Cyclindrical View
Fusion
Baseline
KPConv Fusion
PointNet Fusion
Nearest Neighbor Fusion
Datasets
Training Details
Cylindrical View
Fused Projection
Comparison
Generalization Analyses
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.