Data-driven artificial intelligence systems effectively enhance accuracy and robustness by utilizing multi-view learning to aggregate consistent and complementary information from multi-source data. As one of the most important branches of multi-view learning, multi-view anchor clustering greatly reduces the time complexity via learning similarity graphs between anchors and data instead of data-to-data similarities, which has gained widespread attention in data-driven artificial intelligence domains. However, two issues still exist in current methods: (1) They commonly utilize orthogonal regularization to enhance anchor diversity, which may lead to a distorted anchor distribution, e.g., some clusters might have few or even no corresponding anchors. (2) They only utilize view-sharing anchors to aggregate complementary and consistent information between views, which may fail to ensure anchor robustness due to the heterogeneity gap between views. To this end, self-supervised, multi-view, semantics-aware anchor clustering (SMA2C) is proposed, containing multi-view representation alignment, adaptive anchor selection, and global spectral optimization. Specifically, SMA2C devises dual-level contrastive learning on representations and clustering partitioning between views within a deep encoding–decoding architecture to achieve multi-granularity alignment of views with the heterogeneity gap. Meanwhile, SMA2C introduces adaptive anchor selection via filtering outliers and refining clusters to enhance correlations between clusters and anchors, which ensures the diversity and discriminability of anchors. Finally, extensive evaluations across four real-world datasets confirm that SMA2C establishes a new benchmark for multi-view anchor clustering. In particular, SMA2C achieves a 6.75% improvement in accuracy over the second-best result on the HW dataset.
Read full abstract