Abstract

A two-phase cross-modality fusion detector is proposed in this study for robust and high-precision 3D object detection with RGB images and LiDAR point clouds. First, a two-stream fusion network is built into the framework of Faster RCNN to perform accurate and robust 2D detection. The visible stream takes the RGB images as inputs, while the intensity stream is fed with the intensity maps which are generated by projecting the reflection intensity of point clouds to the front view. A multi-layer feature-level fusion scheme is designed to merge multi-modal features across multiple layers in order to enhance the expressiveness and robustness of the produced features upon which region proposals are generated. Second, a decision-level fusion is implemented by projecting 2D proposals to the space of the point cloud to generate 3D frustums, on the basis of which the second-phase 3D detector is built to accomplish instance segmentation and 3D-box regression on the filtered point cloud. The results on the KITTI benchmark show that features extracted from RGB images and intensity maps complement each other, and our proposed detector achieves state-of-the-art performance on 3D object detection with a substantially lower running time as compared to available competitors.

Highlights

  • As a crucial task in various engineering applications including autonomous driving, safety management, et cetera, high-precision object detection has drawn a great deal of attention in recent years

  • A large number of deep learning-based models such as the series of Faster RCNN, SSD, YOLO [1,2,3], and a lot more custom versions of them have been developed for 2D object detection with RGB images

  • We propose implementing a feature-level fusion in the first-phase and building the 2D detector in the framework of a two-stream Faster R-CNN [1] to extract and fuse features from RGB images and intensity maps, which are generated by projecting the reflection intensity values of LiDAR point clouds to the front view plane

Read more

Summary

Introduction

As a crucial task in various engineering applications including autonomous driving, safety management, et cetera, high-precision object detection has drawn a great deal of attention in recent years. We propose implementing a feature-level fusion in the first-phase and building the 2D detector in the framework of a two-stream Faster R-CNN [1] to extract and fuse features from RGB images and intensity maps, which are generated by projecting the reflection intensity values of LiDAR point clouds to the front view plane. The aim of this design is to produce more robust and expressive features upon which more accurate object classification and bounding box regression can be achieved. The validity of the proposed feature fusion scheme is examined and strongly supported by the experimental results and through visualizing features at multiple network stages

Related Work
Object Detection Based on Multi-Modal Fusion
Overview
PVConvNet-Based Object Detection
Point-Voxel Convolution
Experimental Setups
Implementation Details
Cross-Modality Fusion
Cascade Detector Head and Attention-Based Weighted Fusion
Method
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call