Cross-modal deep discriminant analysis

Xue-Mei Dai,Sheng-Gang Li

doi:10.1016/j.neucom.2017.09.059

Abstract

Abstract Cross-modal analysis has widespread applications ranging from cross-media retrieval to heterogeneous face recognition. The critical problem in cross-modal analysis is to correlate heterogeneous features originating from different modalities. Extensive studies have been focused on discovering shared feature space between modalities, while largely overlooked the discriminant information contained in the cross-modal data. Leveraging the discriminant information has been found effective in discovering the underlying semantic structure to facilitate the end applications. Considering this, we propose a deep learning-based method to simultaneously consider the cross-modal correlation and intra-modal discriminant information. Specifically, a unified objective function is introduced which consists of a LDA-like discriminant part and a CCA-like correlation part. The proposed method can be easily generalized to exploiting the unpaired samples. Extensive experiments are conducted on three representative cross-modal analysis problems: cross-media retrieval, cross-OSN user modeling and heterogeneous face recognition. By comparing with existing state-of-the-art algorithms, the results show that the proposed algorithm is robust to the feature dimension and achieves the best performance in all experiments.

Full Text