Deep quantization network with visual-semantic alignment for zero-shot image retrieval

Huixia Liu,Zhihong Qin

doi:10.3934/era.2023215

Abstract

<abstract><p>Approximate nearest neighbor (ANN) search has become an essential paradigm for large-scale image retrieval. Conventional ANN search requires the categories of query images to been seen in the training set. However, facing the rapid evolution of newly-emerging concepts on the web, it is too expensive to retrain the model via collecting labeled data with the new (unseen) concepts. Existing zero-shot hashing methods choose the semantic space or intermediate space as the embedding space, which ignore the inconsistency of visual space and semantic space and suffer from the hubness problem on the zero-shot image retrieval task. In this paper, we present an novel deep quantization network with visual-semantic alignment for efficient zero-shot image retrieval. Specifically, we adopt a multi-task architecture that is capable of $ 1) $ learning discriminative and polymeric image representations for facilitating the visual-semantic alignment; $ 2) $ learning discriminative semantic embeddings for knowledge transfer; and $ 3) $ learning compact binary codes for aligning the visual space and the semantic space. We compare the proposed method with several state-of-the-art methods on several benchmark datasets, and the experimental results validate the superiority of the proposed method.</p></abstract>

Full Text