Abstract

Fine-grained Sketch-based Image Retrieval (FG-SBIR), which utilizes hand-drawn sketches to search the target object images, has recently drawn much attention. It is a challenging task because sketches and images belong to different modalities and sketches are highly abstract and ambiguous. Existing solutions to this problem either focus on visual comparisons between sketches and images and ignore the multimodal characteristics of annotated images, or treat the retrieval as a one-time process. In this paper, we formulate FG-SBIR as a coarse-to-fine process, and propose a Deep Cascaded Cross-modal Ranking Model (DCCRM) that can exploit all the beneficial multimodal information in sketches and annotated images and improve both the retrieval efficiency and the top-K ranked effectiveness. Our goal concentrates on constructing deep representations for sketches, images, and descriptions, and learning the optimized deep correlations across such different domains. Thus for a given query sketch, its relevant images with fine-grained instance-level similarities in a specific category can be returned, and the strict requirement of the instance-level retrieval for FG-SBIR is satisfied. Very positive results have been obtained in our experiments by using a large quantity of public data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call