Abstract
Three-dimensional object detection is a key task in the field of autonomous driving that is aimed at identifying the position and category of objects in the scene. Due to the 3D nature of data generated by LiDAR, most models use it as input data for detection. However, the low scanning resolution of LiDAR for distant objects has inherent limitations to the method, and multimodal fusion 3D object detection methods have attracted widespread attention, mostly using both LiDAR and camera data as inputs for detection. Certainly, multimodal methods can also lead to many problems, the two main ones being the incomplete utilization of camera features and rough fusion methods. In this study, we proposed a novel multimodal 3D object detection method named VirtualFilter, which uses 3D point clouds and 2D images as inputs. In order to better utilize camera features, VirtualFilter utilizes the image semantic segmentation model to generate image semantic features and uses the semantic information to filter the virtual point cloud data during the virtual point cloud generation process to enhance the data accuracy of the virtual cloud. In addition, VirtualFilter utilizes a better RoI feature fusion strategy named 3D-DGAF (3D Distance-based Grid Attentional Fusion), which employs a attention mechanism based on distance gridding to better fuse the RoI features of the original and virtual point clouds. The experimental results on the authoritative autonomous driving dataset KITTI show that this multimodal 3D object detection method outperforms the baseline method in several evaluation metrics.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have