Abstract
Matching hand-drawn sketches with photos (a.k.a sketch-photo recognition or re-identification) faces the information asymmetry challenge due to the abstract nature of the sketch modality. Existing works tend to learn shared embedding spaces with CNN models by discarding the appearance cues for photo images or introducing GAN for sketch-photo synthesis. The former unavoidably loses discriminability, while the latter contains ineffaceable generation noise. In this paper, we start the first attempt to design an information-aligned sketch transformer (SketchTrans +) via cross-modal disentangled prototype learning, while the transformer has shown great promise for discriminative visual modelling. Specifically, we design an asymmetric disentanglement scheme with a dynamic updatable auxiliary sketch (A-sketch) to align the modality representations without sacrificing information. The asymmetric disentanglement decomposes the photo representations into sketch-relevant and sketch-irrelevant cues, transferring sketch-irrelevant knowledge into the sketch modality to compensate for the missing information. Moreover, considering the feature discrepancy between the two modalities, we present a modality-aware prototype contrastive learning method that mines representative modality-sharing information using the modality-aware prototypes rather than the original feature representations. Extensive experiments on category- and instance-level sketch-based datasets validate the superiority of our proposed method under various metrics.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Pattern Analysis and Machine Intelligence
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.