Abstract

3D vehicle detection based on multi-modal fusion is an important task of many applications such as autonomous driving. Although significant progress has been made, we still observe two aspects that calls for further improvement: First, what extra information can be obtained from the images to complement the point clouds in 3D detection tasks is seldom explored by previous works. Second, most fusion modules can only be used in their designed network, lacking universality. In this work, we propose PointAttentionFusion and DenseAttentionFusion: two end-to-end trainable single-stage multi-modal feature fusion approaches to adaptively combine RGB and point cloud modalities. Experimental results on the KITTI dataset demonstrate significant improvement in filtering false positive over the approaches using only point cloud data. Furthermore, the proposed methods can provide competitive results compared to the published state-of-the-art multi-modal methods in the KITTI benchmark. Both fusion modules are applicable in all voxel-based 3D detection architectures and similar improvements are expected.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call