Abstract

High-precision real-time 3D object detection based on the LiDAR point cloud is an important task for autonomous driving. Most existing methods utilize grid-based convolutional networks to handle sparse and cluttered point clouds. However, the performance of object detection is limited by the coarse grid quantization and expensive computational cost. In this paper, we propose a more efficient representation of 3D point clouds and propose SCNet, a single-stage, end-to-end 3D subdivision coding network that learns finer feature representations for vertical grids. SCNet divides each grid into smaller sub-grids to preserve more point cloud information and converts points in the grid to a uniform feature representation through 2D convolutional neural networks. The 3D point cloud is encoded as the fine 2D sub-grid representation, which helps to reduce the computational cost. We validate our SCNet on the KITTI object benchmark in which we show that the proposed object detector produces state-of-the-art results with more than 20 FPS.

Highlights

  • Great progress has been made on object detection for autonomous driving

  • A lot of works have been done based on image, especially based on the convolutional neural network (CNN) [2], [3]

  • We propose a novel method called subdivision coding network (SCNet), which performs 3D object detection based on the 2D convolutional network

Read more

Summary

INTRODUCTION

Great progress has been made on object detection for autonomous driving. The unmanned vehicle relies on the environmental perception system to obtain external information in which the LiDAR and camera are the major sensors. Since the convolution operator in deep learning requires the input data to have a regular format, the point cloud is usually converted to 3D voxel grids. Besides the 3D voxel grid representation, the MaxPooling operator could transform the unordered point cloud into regular a format This novel idea is firstly proposed in Qi et al [10]. We propose a novel method called SCNet, which performs 3D object detection based on the 2D convolutional network. The grid coding of the bird’s eye view ensures that we can perform 2D convolutions efficiently and utilize the priors of object shape and size. We propose a novel end-to-end neural network architecture for 3D object detection based on the point cloud. We evaluate our SCNet on the KITTI benchmark and show that the results of our method are comparable to the state-of-the-art methods while being much faster

RELATED WORK
IMPLEMENTATION
EXPERIMENTS
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call