Abstract

When localizing and detecting 3D objects for autonomous driving scenes, obtaining information from multiple sensors (e.g., camera, LIDAR) is capable of mutually offering useful complementary information to enhance the robustness of 3D detectors. In this paper, a deep neural network architecture, named RoIFusion, is proposed to efficiently fuse the multi-modality features for 3D object detection by leveraging the advantages of LIDAR and camera sensors. In order to achieve this task, instead of densely combining the point-wise feature of the point cloud with the related pixel features, our fusion method novelly aggregates a small set of 3D Region of Interests (RoIs) in the point clouds with the corresponding 2D RoIs in the images, which are beneficial for reducing the computation cost and avoiding the viewpoint misalignment during the feature aggregation from different sensors. Finally, Extensive experiments are performed on the KITTI 3D object detection challenging benchmark to show the effectiveness of our fusion method and demonstrate that our deep fusion approach achieves state-of-the-art performance.

Highlights

  • Object detection with 3D bounding boxes is one of the fundamental challenges of situational awareness and environmental perception of autonomous systems

  • We propose a lightweight deep fusion neural network, named RoIFusion, aiming at sparsely fusing a small set of Region of Interests (RoIs) from the point clouds and the images for 3D object detection, which is beneficial for avoiding dense point-pixel fusion

  • We propose a fused keypoints generation (FKG) layer to estimate a small set of keypoints on the objects for further ROIs generation, followed by a voting layer used to generate the center points of the objects

Read more

Summary

Introduction

Object detection with 3D bounding boxes is one of the fundamental challenges of situational awareness and environmental perception of autonomous systems (e.g., autonomous vehicles, robots, unmanned aerial vehicles, etc.). In the past few years, 2D object detection is one of the area of computer vision that made the most significant progress [1]–[12], especially with the advent of convolutional neural network (CNN) technology [13]. 3D object detection remains an open challenge, especially when multiple, heterogeneous sensors are used to obtain more diverse and robust information. Many researchers focused on the exploitation of point-cloud based methods for 3D object detection due to the advantages of this type of data provide: precise depth information and dense geometric shape features [14]–[19]. Recent approaches [20]–[25] outperform even fusion-based methods

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call