Abstract

Radar signals can describe the environment more reliably than RGB images when the light is too strong or too weak or the object is occluded. Radar frequency (RF) images are common representation of radar signals and contain rich motion information, but are not as intuitive data as RGB images. Effectively utilizing radar semantic information is a challenge in radar object detection task. In our work, we focus on the extraction of RF images semantic information, and propose a feature extraction network named multi-scale spatiotemporal features fusion and attention network (MSFFANet). An Encoder-Decoder framework with skip connections is used to realize the end-to-end radar object detection. Firstly, in order to effectively utilize the motion information in radar signals, a multi-scale spatiotemporal features fusion block (MSFFB) is designed. Standard 3D convolution and 3D dilated convolution are both used to process the input RF image sequences to achieve cross-channel multi-scale features fusion. Secondly, in order to focus more on the features of the object in the decoder network, the convolutional block attention module (CBAM) is beneficial to weight the features, strengthen the useful features and achieve feature refinement. Finally, experiments and evaluations are performed on the ROD2021 dataset. The average precision rate reaches 76.5542%, the average recall rate reaches 80.4734% and the F1-Score is 0.7846. Compared with the best performance in radar object detection network (RODNet) [4], our method improves 1.6117% in average precision rate, 1.432% in average recall rate and 0.0152 in F1-Score.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call