Reliable environmental perception capabilities are a prerequisite for achieving autonomous driving. Cameras and LiDAR are sensitive to illumination and weather conditions, while millimeter-wave radar avoids these issues. Existing models rely heavily on image-based approaches, which may not be able to fully characterize radar sensor data or efficiently further utilize them for perception tasks. This paper rethinks the approach to modeling radar signals and proposes a novel U-shaped multilayer perceptron network (U-MLPNet) that aims to enhance the learning of omni-dimensional spatio-temporal dependencies. Our method involves innovative signal processing techniques, including a 3D CNN for spatio-temporal feature extraction and an encoder–decoder framework with cross-shaped receptive fields specifically designed to capture the sparse and non-uniform characteristics of radar signals. We conducted extensive experiments using a diverse dataset of urban driving scenarios to characterize the sensor’s performance in multi-view semantic segmentation and object detection tasks. Experiments showed that U-MLPNet achieves competitive performance against state-of-the-art (SOTA) methods, improving the mAP by 3.0% and mDice by 2.7% in RD segmentation and AR and AP by 1.77% and 2.03%, respectively, in object detection. These improvements signify an advancement in radar-based perception for autonomous vehicles, potentially enhancing their reliability and safety across diverse driving conditions.
Read full abstract