Unsupervised hashing has attracted extensive attention in effectively and efficiently tackling large-scale cross-modal retrieval task. Existing methods typically try to mine the latent common subspace across multimodal data without any category annotation. Despite the exciting progress, there are still three challenges that need to be further addressed: 1) efficiently improving the robustness during latent common subspace learning; 2) harmoniously embedding the intra-modal inherence and inter-modal relevance of multimodal data into Hamming space; and 3) effectively reducing the training time complexity and making the model scalable for large-scale datasets. To well address the above challenges, this study proposes a method named Fast Unsupervised Cross-modal Hashing (FUCH). Specifically, FUCH proposes a semantic-aware collective matrix factorization to learn robust representation via exploiting latent category-specific attributes, and introduces Cauchy loss to measure the factorization process. Accordingly, the above process can effectively embed potential discriminative information into common space, while making the model insensitive for outliers. Moreover, FUCH designs a dual projection learning scheme, which not only learns modality-unique hash functions to excavate individual properties, but also learns modality-mutual hash functions to multimodal correlational properties. Experimental results on three benchmark datasets verify the effectiveness of FUCH under various scenarios.
Read full abstract