Multi-view representation learning from local consistency and global alignment

Lingyu Si,Wenwen Qiang,Jiangmeng Li,Fanjiang Xu,Funchun Sun

doi:10.1016/j.neucom.2022.06.069

Abstract

This paper focuses on learning a feature extractor that can extract the shared information from multi-view data unsupervised. Maximizing the mutual information (MI) between multi-view datasets is an alternative to achieving this, and the contrastive loss is usually used to approximate the MI. However, there are three challenges worthy of our attention. The first is that minimizing the contrastive loss can push away two semantically similar instances. The second is that CL-based models are sensitive to batch size. The last is that CL-based models mine shared information between views in an instance-based manner. It cannot capture the discriminative information contained in the overall distribution of different views. Here, we propose a new method called the Local Consistency and Global Alignment Network (LCGAN) to solve these challenges. LCGAN proposes to use a simple clustering-based ”swap” prediction training mechanism to replace the CL loss, thus alleviating the first two challenges mentioned above. To alleviate the third challenge, LCGAN proposes using the Wasserstein distance to align the distribution among the different views. Empirical evaluations of our proposed series of real ship datasets and several benchmark datasets demonstrate the effectiveness of the proposed LCGAN.

Full Text