Abstract

Current supervised methods for 3D shape representation learning have achieved satisfying performance, yet require extensive human-labeled datasets. Unsupervised learning-based methods provide a viable solution by learning shape representations without using ground truth labels. In this study, we develop a contrastive learning framework for unsupervised representation learning of 3D shapes. Specifically, in order to encourage models to pay more attention to useful information during representation learning, we first introduce a new paradigm for critical points search based on the adversarial mechanism. We extract critical points with a larger impact on the global feature by attacking a pre-trained auto-encoder model, and apply data augmentations on these points to generate adversarial examples. Taking a pair of adversarial examples as inputs, we obtain their intermediate embeddings and global representations of corresponding inputs, which are then transformed into latent spaces by two predictor heads. Finally, we train the proposed model by maximizing the agreements on these latent spaces via Normalized Temperature-scaled Cross Entropy (NT-Xent) loss and a newly designed Cross-layer Normalized Temperature-scaled Cross Entropy (Cross-NT-Xent) loss, where the latter is proposed in this paper to enforce cross-layer feature similarities. The effectiveness, robustness, and transferability of learned representations are validated on three downstream tasks, including object classification, few-shot classification, and shape retrieval. Experiments on three benchmark datasets show that our learned representations achieve better or competitive performance than current state-of-the-art methods in these downstream tasks. Moreover, our model can easily be extended to 3D part segmentation and scene segmentation tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call