Multi-Modal and Multi-Scale Fusion 3D Object Detection of 4D Radar and LiDAR for Autonomous Driving

Li Wang,Haifeng Chen,Dafeng Jin,Lei Yang,Lijun Zhao,Xinyu Zhang,Baowei Xv,Jun Li,Rong Fu

doi:10.1109/tvt.2022.3230265

Abstract

Multi-modal fusion plays a critical role in 3D object detection, overcoming the inherent limitations of single-sensor perception in autonomous driving. Most fusion methods require data from high-resolution cameras and LiDAR sensors, which are less robust and the detection accuracy drops drastically with the increase of range as the point cloud density decreases. Alternatively, the fusion of Radar and LiDAR alleviates these issues but is still a developing field, especially for 4D Radar with a more robust and broader detection range. Nevertheless, different data characteristics and noise distributions between two sensors hinder performance improvement when directly integrating them. Therefore, we are the first to propose a novel fusion method termed <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$M^{2}$</tex-math></inline-formula> -Fusion for 4D Radar and LiDAR, based on Multi-modal and Multi-scale fusion. To better integrate two sensors, we propose an Interaction-based Multi-Modal Fusion (IMMF) method utilizing a self-attention mechanism to learn features from each modality and exchange intermediate layer information. Specific to the current single-resolution voxel division's precision and efficiency balance problem, we also put forward a Center-based Multi-Scale Fusion (CMSF) method to first regress the center points of objects and then extract features in multiple resolutions. Furthermore, we present a data preprocessing method based on Gaussian distribution that effectively decreases data noise to reduce errors caused by point cloud divergence of 4D Radar data in the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$x$</tex-math></inline-formula> - <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$z$</tex-math></inline-formula> plane. To evaluate the proposed fusion method, a series of experiments were conducted using the Astyx HiRes 2019 dataset, including the calibrated 4D Radar and 16-line LiDAR data. The results demonstrated that our fusion method compared favorably with state-of-the-art algorithms. When compared to PointPillars, our method achieves mAP (mean average precision) increases of 5.64 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> and 13.57 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> for 3D and BEV (bird's eye view) detection of the car class at a moderate level, respectively.

Full Text