Abstract

Analyzing and extracting geometric features from 3D data is a fundamental step in 3D scene understanding. Recent works demonstrated that deep learning architectures can operate directly on raw point clouds, i.e. without the use of intermediate grid-like structures. These architectures are however not designed to encode contextual information in-between objects efficiently. Inspired by a global feature aggregation algorithm designed for images (Zhao et al., 2017), we propose a 3D pyramid module to enrich pointwise features with multi-scale contextual information. Our module can be easily coupled with 3D semantic segmentation methods operating on 3D point clouds. We evaluated our method on three large scale datasets with four baseline models. Experimental results show that the use of enriched features brings significant improvements to the semantic segmentation of indoor and outdoor scenes.

Highlights

  • The semantic segmentation of 3D point clouds is an important problem in 3D computer vision, in particular for autonomous driving, robotics and augmented reality (Tchapmi et al, 2017)

  • The goal of our work is not to achieve state-of-the-art performances on the datasets, but to 35 propose a generic module that can be concatenated with any 3D neural network to infer richer pointwise features

  • Overview We propose a generic 3d-PSPNet module that can be concatenated after any pointwise feature based approach

Read more

Summary

Introduction

The semantic segmentation of 3D point clouds is an important problem in 3D computer vision, in particular for autonomous driving, robotics and augmented reality (Tchapmi et al, 2017). The success of deep learning in image analysis (Long et al, 2015; Badrinarayanan et al, 2015; Chen et al, 2018) has drawn considerable attention in 3D scene understanding. An alternative approach is to first convert point clouds into an intermediate grid-like representation before exploiting CNNs. An alternative approach is to first convert point clouds into an intermediate grid-like representation before exploiting CNNs Such representations can take the form of multi-view images (Su et al, 2015; Kalogerakis et al, 2017; Boulch et al, 2018) or voxel grids

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call