Abstract

The representation of pseudo point cloud can significantly improve the precision of 3D object detection. However, existing pseudo point cloud-based methods typically fuse the processed features through coarse concatenation, which ignores the consistency between the point cloud and pseudo point cloud features. The inconsistency of features in different modal data can lead to detection bias. In this paper, we propose a novel pseudo point cloud-based network called SGF3D, which utilizes a cross-modal attention module cross-modal attention fusion (CMAF) to fuse point cloud and pseudo point cloud features. It can better learn the cross-modal similarity of output features, enabling the detection box to fit better with the target. We also designed a region of interest (RoI) head similarity attention head (SAH) to utilize the overlooked similarity to optimize training without increasing the complexity of the network. By using CMAF and SAH, the proposed method can obtain more accurate bounding boxes. Extensive experiments on KITTI dataset demonstrate that the proposed method can achieve competitive results. Training code and well trained weights are available at https://github.com/ChunZheng2022/SGF3D.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call