Fine-Grained Image-Text Retrieval via Discriminative Latent Space Learning

Min Zheng,Wen Wang,Qingyong Li

doi:10.1109/lsp.2021.3065595

Abstract

Fine-grained image-text retrieval aims at searching relevant images among fine-grained classes given a text query or in a reverse way. The challenges are not only bridging the gap between two heterogeneous modalities but also dealing with large inter-class similarity and intra-class variance existed in fine-grained data. To deal with the above challenges, we propose a Discriminative Latent Space Learning (DLSL) method for fine-grained image-text retrieval. Concretely, image and text features are extracted for capturing the subtle difference in fine-grained data. Subsequently, based on the extracted features, we perform couple dictionary learning to align the heterogeneous data in a uniform latent space. To make such alignment discriminative enough for the fine-grained task, the learned latent space is endowed with discriminative property via learning a discriminative map. Comprehensive experiments on fine-grained datasets demonstrate the effectiveness of our approach.

Full Text