Contrastive cross-modal clustering with twin network

Yiqiao Mao,Shizhe Hu,Yangdong Ye,Xiaoqiang Yan

doi:10.1016/j.patcog.2024.110645

Abstract

Cross-modal clustering (CMC) methods explore the correlation information between multiple modalities to improve clustering performance. However, the obvious differences between heterogeneous modalities make it difficult to obtain the correlation information directly. In this paper, we propose a novel Contrastive Cross-modal Clustering with Twin Network (3CTnet) for CMC, which contrasts the differences of multiple modalities to fully mine the correlation information. The 3CTnet contains two modal-special encoders and an attention-based correlation propagate module (CPM). First, the modal-special encoders are trained by pseudo-labels to learn the clustering structure and feature of single modality. Then we contrast the clustering structures and features of different modalities to explore the inter-cluster and inter-feature correlation information simultaneously. Finally, the CPM is designed to propagate the learned correlation information among modal-special encoders to further optimize the learning of features and clustering structures. The experiments show that 3CTnet outperforms the state-of-the-art CMC methods on six large datasets.

Full Text