FETR: Feature Transformer for vehicle-infrastructure cooperative 3D object detection

Wenchao Yan,Hua Cao,Jiazhong Chen,Tao Wu

doi:10.1016/j.neucom.2024.128147

Abstract

3D object detection plays a crucial role in the perception system of autonomous vehicles, however, the vehicle’s field of view is restricted due to obstructions from nearby vehicles and buildings. Vehicle-infrastructure cooperation can compensate for the issue of visibility, but due to discrepancies in timestamps between vehicle and infrastructure sensors as well as data transmission delays, there is typically a time asynchrony between vehicle and infrastructure data. Therefore, Feature Transformer (FETR) has been introduced, which is a vehicle-infrastructure cooperative 3D object detection model utilizing Transformer as a Feature Predictor. The Transformer Predictor is capable of predicting features of future frame based on the current frame features, efficiently addressing the problem of time asynchrony. Additionally, to enhance the precision of 3D object detection, we have introduced a plug-and-play module named Mask Feature Enhancement (MFE), MFE employs a mask to amplify the features in the object region while simultaneously diminishing the features of the surrounding environment, enlarging the difference between object features and environmental features, thereby improving the detection effect. Experimental results show that FETR attains a 68.15 BEV-mAP (IoU=0.5) on the DAIR-V2X dataset, with a 200ms latency, and the data transmission is merely 6.0×104 bytes, constituting just 4.2% of the original point cloud data, outperforming current vehicle-infrastructure cooperative models in terms of both precision and data transmission.

Full Text