Deep contrastive representation learning for multi-modal clustering

Yang Lu,Qin Li,Xiangdong Zhang,Quanxue Gao

doi:10.1016/j.neucom.2024.127523

Abstract

Benefiting from the informative expression capability of contrastive representation learning (CRL), recent multi-modal learning studies have achieved promising clustering performance. However, it should be pointed out that the existing multi-modal clustering methods based on CRL fail to simultaneously take the similarity information embedded in inter- and intra-modal levels. In this study, we mainly explore deep multi-modal contrastive representation learning, and present a multi-modal learning network, named trustworthy multi-modal contrastive clustering (TMCC), which incorporates contrastive learning and adaptively reliable sample selection with multi-modal clustering. Specifically, we are concerned with an adaptive filter to learn TMCC via progressing from ‘easy’ to ‘complex’ samples. Based on this, with the highly confident clustering labels, we present a new contrastive loss to learn modal-consensus representation, which takes into account not only the inter-modal similarity but also the intra-modal similarity. Experimental results show that these principles in TMCC consistently help promote clustering performance improvement.

Full Text