Abstract

Environmental perception technology is key to self-driving. Nowadays, this is mostly conducted using cameras and LiDAR, despite their poor immunity to interference and high price. The millimeter-wave radar can solve these problems, but the current radar-based models suffer from over-complexity and poor global modeling capability. Moreover, the anti-interference capability of millimeter-wave radar object detection (ROD) techniques in complex environments is also a major challenge. Considering this, this article proposes a novel model called Transformer ROD network (T-RODNet), which consists of a convolutional neural network (CNN) and transformer, aiming to simultaneously utilize the ability of both to acquire local and global features. In order to improve the modeling capability of the encoder and decoder, the dimensional apart module (DAM) and T-window-multihead self-attention (T-W-MSA)/shifted window-multihead self-attention (SW-MSA) modules are proposed, which can greatly improve the performance of the model. Experiments show that T-RODNet achieves state-of-the-art (SOTA) performance on both CRUW and CARRADA datasets. The GFLOPs of T-RODNet are only 8.5% of RODNet-HG, but the average precision (AP) is 3.84 higher. Besides, T-RODNet also achieves a strong resistance to interference on the CRUW dataset with noise added.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call