Incomplete Multi-View Clustering (IMVC), enhanced by contrastive learning, stands out in unsupervised learning for its notable performance. However, it faces challenges: over-reliance on additional projection heads to avoid dimensionality collapse, leading to redundant parameters, and the risk of encoder-derived features, which merge view-specific information, misleading the learning of common semantics due to simultaneous learning and reconstruction in the same feature space. To address these issues, we propose a novel framework for incomplete multi-view contrastive clustering. This framework employs an encoder network with a self-attention mechanism, allowing both reconstruction loss and contrastive loss to act on the learned feature vectors and their sub-vectors, respectively. This approach effectively mitigates the impact of extraneous private information. By leveraging sub-vectors for consistency learning, our model directly refines the latent feature subspace, thus circumventing dimensionality collapse without the dependence on projection heads. Additionally, our method incorporates a cross-view prediction mechanism to recuperate missing information in incomplete datasets. Comprehensive experiments on public datasets demonstrate that our method achieves state-of-the-art clustering performance.
Read full abstract