Unsupervised Pre-training for 3D Object Detection with Transformer

Maosheng Sun,Zeren Sun,Xiaoshui Huang,Qiong Wang,Yazhou Yao

doi:10.1007/978-3-031-18913-5_7

Abstract

AbstractTransformer improve the performance of 3D object detection with few hyperparameters. Inspired by the recent success of the pre-training Transformer in 2D object detection and natural language processing, we propose a pretext task named random block detection to unsupervisedly pre-train 3DETR (UP3DETR). Specifically, we sample random blocks from original point clouds and feed them into the Transformer decoder. Then, the whole Transformer is trained by detecting the locations of these blocks. The pretext task can pre-train the Transformer-based 3D object detector without any manual annotations. In our experiments, UP3DETR performs 6.2\(\%\) better than 3DETR baseline on challenging ScanNetV2 datasets and has a faster convergence speed on object detection tasks.KeywordsUnsupervised pre-trainingTransformer3D object detection

Full Text