Abstract

Recently, the pre-training paradigm combining Transformer and masked language modeling in BERT has achieved tremendous success not only in NLP, but also in images and point clouds. However, directly extending BERT from NLP to point clouds requires first training a discrete Variational AutoEncoder (dVAE) as the tokenizer, which results in a complex two-stage process, as in Point-BERT. Inspired by BERT and MoCo, we propose POS-BERT, a one-stage BERT pre-training method for point clouds. Specifically, we use the masked patch modeling (MPM) task to perform point cloud pre-training, which aims to recover masked patch information under the supervision of a tokenizer’s output. Unlike Point-BERT, whose tokenizer is extra-trained and frozen, we propose a momentum tokenizer which is dynamically updated during training the Transformer. Furthermore, in order to better learn high-level semantic representation, we integrate contrastive learning into the proposed framework to maximize the class token consistency between augmented point cloud pairs. Experiments show that POS-BERT achieves the state-of-the-art performance on linear SVM classification of ModelNet40 with fixed feature extractors, and it exceeds Point-BERT by 3.5%. In addition, POS-BERT has significantly improved many downstream tasks, including fine-tuned classification, few-shot classification and part segmentation. The code and trained models will be released on https://github.com/fukexue/POS-BERT.git.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call