Abstract

Self-supervised learning methods are able to learn latent features from unlabeled data samples by designing pre-text tasks, and hence have attracted a great deal of interest in terms of sample-efficient learning. As a promising scheme of self-supervised learning, masked autoencoding has significantly advanced natural language processing and computer vision, but has not been introduced to point cloud yet. In this paper, we propose a novel scheme of masked autoencoders for 3D point cloud self-supervised learning, addressing the special challenges posed by point cloud, including leakage of location information and uneven information density. Concretely, we divide the input point cloud into irregular point patches and randomly mask them at a high ratio. Then, a standard Transformer-based autoencoder, with an asymmetric design and a shifting mask tokens operation, learns high-level latent features from unmasked point patches, aiming to reconstruct the masked point patches. Extensive experiments show that our approach is efficient during pre-training and generalizes well on various downstream tasks. Apart from the proposed method, we will also introduce potential directions for 3D point cloud self-supervised learning, including improvements for masked autoencoding, developments for point cloud scene understanding, etc.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call