Abstract

Fine-grained sketch-based image retrieval (FG-SBIR) is considered an ideal method of image retrieval due to the rich and easily accessible characteristics of sketches. It aims to find the most similar photo from the photo gallery based on the input sketch. Most previous works follow the paradigm that extracting global feature first and then projecting the features of sketch and photo to unified embedding feature space using triplet loss. However, the global feature is not appropriate for extracting the crucial fine-grained information. Based on this principle, we propose a Dual Local Interaction Network (DLI-Net). DLI-Net explores an effective and efficient way to utilize local features for FG-SBIR. Specifically, we first propose a Local Feature Extractor to extract mid-level local features. Then, in response to the problems brought by local features, we propose a Dual Interaction Module, which contains Self Interaction Module and Cross Interaction Module. Self Interaction Module speeds up retrieval by eliminating the redundant local features of background. Cross Interaction Module solves the spatial misalignment by making the sketches interact with photos. Extensive experiments on six commonly used datasets show that our DLI-Net outperforms state-of-the-art competitors by a significant margin with a reasonable retrieval speed. Moreover, to the best of our knowledge, DLI-Net is the first model that beats humans on all six datasets. Besides, DLI-Net also performs best on cross-category fine-grained sketch-based image retrieval task, which further demonstrates local features are more appropriate for FG-SBIR.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call