A Transformer Based Complex-YOLOv4-Trans for 3D Point Cloud Object Detection on Embedded Device

Jingfeng Zhang,Jing Yu,Qingzeng Song

doi:10.1088/1742-6596/2404/1/012026

Jingfeng Zhang, Jing Yu + Show 1 more

Open Access

https://doi.org/10.1088/1742-6596/2404/1/012026

Copy DOI

Abstract

3D object detection is widely used in the field of autonomous driving. The embedded devices have few resources and low power consumption, so the existing 3D object detection algorithms are difficult to complete the detection task in the required time under some conditions. In this paper, the Complex-YOLOv4-Trans model is proposed to use point cloud data for 3D object detection. Firstly, a transformer encoder block is introduced to improve the structure of Backbone, which makes full use of its context information to improve detection accuracy. Secondly, the general convolution in Backbone is replaced with a depth-wise separable convolution, which effectively reduces the computational load of the model. Finally, Complete-IoU (CIoU) is used as the loss function, and then 8-bit model quantization is performed to speed up the inference. We select the KITTI dataset and deploy the model on NVIDIA Jetson Xavier NX by using TensorRT acceleration tools to test the methods. The mAP of Bird’s Eye View for Car is 71.79%, the mAP for 3D Detection is 15.82%, and the FPS on the NX device is 42 frames. Experimental results show that our Complex-YOLOv4-Trans can perform real-time 3D object detection tasks on low-energy embedded devices.

Full Text