Abstract

A point cloud is a set of points defined in a 3D metric space. Point clouds have become one of the most significant data formats for 3D representation and are gaining increased popularity as a result of the increased availability of acquisition devices, as well as seeing increased application in areas such as robotics, autonomous driving, and augmented and virtual reality. Deep learning is now the most powerful tool for data processing in computer vision and is becoming the most preferred technique for tasks such as classification, segmentation, and detection. While deep learning techniques are mainly applied to data with a structured grid, the point cloud, on the other hand, is unstructured. The unstructuredness of point clouds makes the use of deep learning for its direct processing very challenging. This paper contains a review of the recent state-of-the-art deep learning techniques, mainly focusing on raw point cloud data. The initial work on deep learning directly with raw point cloud data did not model local regions; therefore, subsequent approaches model local regions through sampling and grouping. More recently, several approaches have been proposed that not only model the local regions but also explore the correlation between points in the local regions. From the survey, we conclude that approaches that model local regions and take into account the correlation between points in the local regions perform better. Contrary to existing reviews, this paper provides a general structure for learning with raw point clouds, and various methods were compared based on the general structure. This work also introduces the popular 3D point cloud benchmark datasets and discusses the application of deep learning in popular 3D vision tasks, including classification, segmentation, and detection.

Highlights

  • We live in a three-dimensional world; since the invention of the camera, visual information of the 3D world has been projected onto 2D images

  • The performance of the methods is reviewed on popular benchmark datasets: the Modelnet40 dataset [48] for classification; ShapeNet [87] and Stanford 3D Indoor Semantics Dataset (S3DIS) [101] for parts and semantic segmentation, respectively; the ScanNet [64] benchmark for 3D Semantic instance segmentation; and the KITTI dataset [111,112] for object detection

  • The increasing availability of point clouds as a result of the evolution of scanning devices coupled with their increasing application in autonomous vehicles, robotics, augmented reality (AR) and virtual reality(VR), etc., demands fast and efficient algorithms for their processing to achieve improved visual perception, such as recognition, segmentation, and detection

Read more

Summary

Introduction

We live in a three-dimensional world; since the invention of the camera, visual information of the 3D world has been projected onto 2D images. 3D point cloud data have become popular as a result of the increasing availability of sensing devices, especially light detection and ranging (LiDAR)-based devices such as Tele-15 [8], Leica BLK360 [9], Kinect V2 [10], etc., and, more recently, mobile phones with a time of flight (tof) depth camera. These sensing devices allow the easy acquisition of the 3D world in 3D point clouds.

Methodology
Challenges of Deep Learning with Point Clouds
Structured Grid-Based Learning
Voxel-Based Approach
Multi-View-Based Approach
Higher-Dimensional Lattices
Deep Learning Directly with a Raw Point Cloud
PointNet
Approaches with Local Structure Computation
Approaches That Do Not Explore Local Correlation
Approaches That Explore Local Correlation
Graph-Based Approaches
Summary
Method
Benchmark Datasets
ModelNet
ShapeNet
Augmenting ShapeNet
Shape2Motion
ScanObjectNN
NYUDv2
SceneNN
ScanNet
Matterport3D
Multisensor Indoor Mapping and Positioning Dataset
ASL Dataset
Oxford Robotcar
Semantic3D
Apollo
6.3.12. Whu-TLS
Application of Deep Learning in 3D Vision Tasks
Classification
Segmentation
Object Detection
Findings
Summary and Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call