Frustum FusionNet: Amodal 3D Object Detection with Multi-Modal Feature Fusion

Liangyu Zuo,Qiao Li,Yuehu Liu,Yaochen Li,Mengtao Han

doi:10.1109/itsc48978.2021.9564764

Abstract

Three-dimensional (3D) object detection plays an important role in computer vision and intelligent transportation systems. The location and direction of obstacles in a road scene can be specified to provide navigation for unmanned vehicles. In this paper, we propose a novel network architecture called Frustum FusionNet (F-FusionNet), which can effectively extract and concatenate features from frustum point clouds and RGB images to generate amodal 3D object detection results. To simultaneously detect objects of different sizes, our method divides each frustum point cloud into continuous segments. Our MSE-Net module fully extracts and fuses segment-wise local features of different scales by utilizing a multi-scale sliding window and segment-wise adaptive learning fusion algorithm. Moreover, the image features are aggregated to refine the 3D object detection using F-FusionNet. For different objects, our robust method has exactly the same network architecture and parameters for practicability. Our method is evaluated on a road scene from the KITTI dataset. Extensive experiments and comparisons were conducted on the KITTI benchmark, which demonstrates the effectiveness of the proposed method.

Full Text