Abstract

Camera and millimeter-wave (MMW) radar fusion is essential for accurate and robust autonomous driving systems. With the advancement of radar technology, next-generation high-resolution automotive radar, i.e., 4D radar, has emerged. In addition to the target range, azimuth, and Doppler velocity measurements of traditional radar, 4D radar provides elevation measurement to create a denser “point cloud.” In this study, we propose a camera and 4D radar fusion network called RCFusion, which achieves multimodal feature fusion under a unified bird’s-eye view (BEV) space to accomplish 3D object detection tasks. In the camera stream, multi-scale feature maps are obtained by the image backbone and feature pyramid network; they are then converted into orthographic feature maps by an orthographic feature transform. Next, enhanced and fine-grained image BEV features are obtained via a designed shared attention encoder. Meanwhile, in the 4D radar stream, a newly designed component named Radar PillarNet efficiently encodes the radar features to generate radar pseudo-images, which are fed into the point cloud backbone to create radar BEV features. An interactive attention module is proposed for the fusion stage, which outputs a valid fusion of the two-modal BEV features. Finally, a generic detection head predicts the object classes and locations. The proposed RCFusion is validated on the TJ4DRadSet and View-of-Delft datasets. The experimental results and analysis show that the proposed method can effectively fuse camera and 4D radar features to achieve robust detection performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call