Enhancing Point Features with Spatial Information for Point-Based 3D Object Detection

Huaijin Liu,Jixiang Du,Yong Zhang,Hongbo Zhang,Jianping Gou

doi:10.1155/2021/4650660

Abstract

Currently, there are many kinds of voxel-based multisensor 3D object detectors, while point-based multisensor 3D object detectors have not been fully studied. In this paper, we propose a new 3D two-stage object detection method based on point cloud and image fusion to improve the detection accuracy. To address the problem of insufficient semantic information of point cloud, we perform multiscale deep fusion of LiDAR point and camera image in a point-wise manner to enhance point features. Due to the imbalance of LiDAR points, the object point cloud in the long-distance area is sparse. We design a point cloud completion module to predict the spatial shape of objects in the candidate boxes and extract the structural information to improve the feature representation ability to further refine the boxes. The framework is evaluated on widely used KITTI and SUN-RGBD dataset. Experimental results show that our method outperforms all state-of-the-art point-based 3D object detection methods and has comparable performance to voxel-based methods as well.

Highlights

Continuous convolution or bilinear interpolation is used to modify alignment to overcome the challenges of different perspectives, quantifying point cloud 3D structures into bird’s eye view (BEV) pseudoimages to fusion image features will inevitably suffer a loss of accuracy. ere are Scientific Programming some research works [12, 13] using 3D frustum projected by 2D bounding boxes to estimate 3D bounding boxes, but these methods require additional 2D annotations and their performance is limited by 2D detectors. e above multisensor feature fusion methods all transform point clouds from sparse formation to compact representation by projecting them into images or subdividing them into uniformly distributed voxel
(3) We propose a new two-stage 3D object detection framework based on point cloud and image fusion. e test results on the KITTI benchmark show that the accuracy of our method is higher than all the current multisensor-based 3D object detection methods
It can be seen that our method outperforms all advanced point-based multisensor methods F-PointNet [12], IDMOD [33], PI-RCNN [14], and EPNet [15] by 10.84%, 5.45%, 5.29%, and 0.47%, respectively

Summary

Introduction

Scientific Programming some research works [12, 13] using 3D frustum projected by 2D bounding boxes to estimate 3D bounding boxes, but these methods require additional 2D annotations and their performance is limited by 2D detectors. e above multisensor feature fusion methods all transform point clouds from sparse formation to compact representation by projecting them into images or subdividing them into uniformly distributed voxel. E above multisensor feature fusion methods all transform point clouds from sparse formation to compact representation by projecting them into images or subdividing them into uniformly distributed voxel. We call these methods voxelbased multimodal feature fusion methods, which voxelize the entire point cloud. In order to improve the detection accuracy of difficult cases, SIENet [17] predicts the shape of distant objects through point completion network to enhance the spatial structure information. Inspired by some multitask work (EPNet and SIENet), this paper proposes a point-based multimodal fusion 3D object detection method with enhanced spatial structure

Methods

Results

Conclusion