Abstract

Around-view multi-camera 3D object detection in BEV (Bird’s-Eye-View) space has been a research focus over the past few years. As a typical supervised training task, many researchers promote this area with different task-specific key designs, such as exploiting temporal information and correspondence of perspective image plane and BEV space. Most of these works follow the DETR detection framework, yet the nature of learnable queries in DETR, the encodings of objects’ center and bounding box information, have not been discussed in previous studies. In this paper, we take advantage of this prior and further extend it to 3D detection tasks. In 3D object detection, the ground-truth bounding boxes are hardly overlapping. Therefore, the queries should be more diverse under this hypothesis. To achieve this goal, we propose a Plug-in Discrimination Module (PDM) to discriminate learnable queries from all the other queries with a discrimination loss to ensure the diversity of queries. The PDM is a simple train-time-only module. It contains a query projection head to project all the object queries into a common latent space. In the latent space, the discrimination loss is conducted on all the queries. Experimental results show that this design can directly improve the 3D detector’s performance without modifying the detector’s architecture and adding extra inference costs. The NDS improvement on the nuScenes dataset is up to a maximum of 1.62% in the 8th training epoch and remains an average 0.64% improvement in the following epochs, compared with the baseline model.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call