Canonical correlation analysis (CCA) is a powerful tool for analyzing multi-dimensional paired data. However, when facing semi-supervised multi-modal data (Also called multi-view Hou et al. (Pattern Recog 43(3):720–730, 2010) or multi-represented Kailing et al. (Clustering multi-represented objects with noise. In: Proceedings of the eighth Pacific-Asia conference on knowledge discovery and data mining (PAKDD). Sydney, Australia, pp 394–403) data. For convenience, we will uniformly call them multi-modal data hereafter.) which widely exist in real-world applications, CCA usually performs poorly due to ignoring useful supervised information. Meanwhile, due to the limited labeled training samples in the semi-supervised scenario, supervised extensions of CCA suffer from overfitting. Several semi-supervised extensions of CCA have been proposed recently. Nevertheless, they either just utilize the global structural information captured from the unlabeled data, or propagate label information by discovering the affinities just between the labeled and unlabeled data points in advance. In this paper, we propose a robust multi-modal semi-supervised feature extraction and fusion framework, termed as dual structural consistency based multi-modal correlation propagation projections (SCMCPP). SCMCPP guarantees the consistency between representation structure and hypotaxis structure in each modality and ensures the consistency of hypotaxis structure between two different modalities. By iteratively propagating labels and learning affinities, discriminative information of both given labels and estimated labels is utilized to improve the affinity construction and infer the remaining unknown labels. Moreover, probabilistic within-class scatter matrices in each modality and probabilistic correlation matrix between two modalities are constructed to enhance the discriminative power of features. Extensive experiments on several benchmark face databases demonstrate the effectiveness of our approach.
Read full abstract