Robust object detection in complex environments, poor visual conditions, and open scenarios presents significant technical challenges in autonomous driving. These challenges necessitate the development of advanced fusion methods for millimeter-wave (mmWave) radar point cloud data and visual images. To address these issues, this paper proposes a radar-camera robust fusion network (RCRFNet), which leverages self-supervised learning and open-set recognition to effectively utilise the complementary information from both sensors. Specifically, the network uses matched radar-camera data through a frustum association approach to generate self-supervised signals, enhancing network training. The integration of global and local depth consistencies between radar point clouds and visual images, along with image features, helps construct object class confidence levels for detecting unknown targets. Additionally, these techniques are combined with a multi-layer feature extraction backbone and a multimodal feature detection head to achieve robust object detection. Experiments on the nuScenes public dataset demonstrate that RCRFNet outperforms state-of-the-art (SOTA) methods, particularly in conditions of low visual visibility and when detecting unknown class objects.
Read full abstract