AbstractMasked point modeling (MPM) has gained considerable attention in self‐supervised learning for 3D point clouds. While existing self‐supervised methods have progressed in learning from point clouds, we aim to address their limitation of capturing high‐level semantics through our novel attention‐guided masking framework, Point‐AGM. Our approach introduces an attention‐guided masking mechanism that selectively masks low‐attended regions, enabling the model to concentrate on reconstructing more critical areas and addressing the limitations of random and block masking strategies. Furthermore, we exploit the inherent advantages of the teacher‐student network to enable cross‐view contrastive learning on augmented dual‐view point clouds, enforcing consistency between complete and partially masked views of the same 3D shape in the feature space. This unified framework leverages the complementary strengths of masked point modeling, attention‐guided masking, and contrastive learning for robust representation learning. Extensive experiments have shown the effectiveness of our approach and its well‐transferable performance across various downstream tasks. Specifically, our model achieves an accuracy of 94.12% on ModelNet40 and 87.16% on the PB‐T50‐RS setting of ScanObjectNN, outperforming other self‐supervised learning methods.
Read full abstract