Abstract

At present, the unsupervised visual representation learning of the point cloud model is mainly based on generative methods, but the generative methods pay too much attention to the details of each point, thus ignoring the learning of semantic information. Therefore, this paper proposes a discriminative method for the contrastive learning of three-dimensional point cloud visual representations, which can effectively learn the visual representation of point cloud models. The self-attention point cloud capsule network is designed as the backbone network, which can effectively extract the features of point cloud data. By compressing the digital capsule layer, the class dependence of features is eliminated, and the generalization ability of the model and the ability of feature queues to store features are improved. Aiming at the equivariance of the capsule network, the Jaccard loss function is constructed, which is conducive to the network distinguishing the characteristics of positive and negative samples, thereby improving the performance of the contrastive learning. The model is pre-trained on the ShapeNetCore data set, and the pre-trained model is used for classification and segmentation tasks. The classification accuracy on the ModelNet40 data set is 0.1% higher than that of the best unsupervised method, PointCapsNet, and when only 10% of the label data is used, the classification accuracy exceeds 80%. The mIoU of part segmentation on the ShapeNet data set is 1.2% higher than the best comparison method, MulUnsupervised. The experimental results of classification and segmentation show that the proposed method has good performance in accuracy. The alignment and uniformity of features are better than the generative method of PointCapsNet, which proves that this method can learn the visual representation of the three-dimensional point cloud model more effectively.

Highlights

  • A point cloud is an interactive point set with an unchanged sparse order defined in coordinate space and samples from the surface of objects to capture their spatial semantic information [1]

  • Aiming at the equivariance of the capsule network, this paper proposes a Jaccard contrast loss using Jaccard similarity coefficients to describe the similarity between features, which is conducive to the model’s distinction between positive and negative samples and improves the performance of the contrast learning method

  • Inspired by the contrastive learning method in the two-dimensional image field, this paper proposes a three-dimensional point cloud visual representation contrastive learning method for learning the visual representation of the three-dimensional point cloud model

Read more

Summary

Introduction

A point cloud is an interactive point set with an unchanged sparse order defined in coordinate space and samples from the surface of objects to capture their spatial semantic information [1]. Point clouds are obtained through 3D sensors (such as LiDAR scanners and RGB-D cameras). They can be used in human–machine interaction [2], automatic driving vehicles [3], and robot technology [4], and have a high practical application value. The manual labeling of point cloud targets is very expensive. Unsupervised visual representation learning can learn the effective visual representation of 3D point cloud targets without labeling information. The unsupervised visual representation learning of point clouds [5] has become a research hotspot

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call