Despite significant advances in the deep clustering research, there remain three critical limitations to most of the existing approaches. First, they often derive the clustering result by associating some distribution-based loss to specific network layers, neglecting the potential benefits of leveraging the contrastive sample-wise relationships. Second, they frequently focus on representation learning at the full-image scale, overlooking the discriminative information latent in partial image regions. Third, although some prior studies perform the learning process at multiple levels, they mostly lack the ability to exploit the interaction between different learning levels. To overcome these limitations, this paper presents a novel deep image clustering approach via Partial Information discrimination and Cross-level Interaction (PICI). Specifically, we utilize a Transformer encoder as the backbone, coupled with two types of augmentations to formulate two parallel views. The augmented samples, integrated with masked patches, are processed through the Transformer encoder to produce the class tokens. Subsequently, three partial information learning modules are jointly enforced, namely, the partial information self-discrimination (PISD) module for masked image reconstruction, the partial information contrastive discrimination (PICD) module for the simultaneous instance- and cluster-level contrastive learning, and the cross-level interaction (CLI) module to ensure the consistency across different learning levels. Through this unified formulation, our PICI approach for the first time, to our knowledge, bridges the gap between the masked image modeling and the deep contrastive clustering, offering a novel pathway for enhanced representation learning and clustering. Experimental results across six image datasets demonstrate the superiority of our PICI approach over the state-of-the-art. In particular, our approach achieves an ACC of 0.772 (0.634) on the RSOD (UC-Merced) dataset, which shows an improvement of 29.7% (24.8%) over the best baseline. The source code is available at https://github.com/Regan-Zhang/PICI.
Read full abstract