Abstract

Composed image retrieval (CIR) aims at fusing a reference image and text feedback to search for the desired images. Compared to general image retrieval, it can model the users' search intent more comprehensively and search the target images more accurately, which has significant impacts in various real-world applications, such as E-commerce and Internet search. However, because of the existing heterogeneous semantic gap, the synthetic understanding and fusion of both image and text are difficult to implement. In this work, to tackle this difficult problem, we propose an end-to-end framework MCR, which uses text and images as retrieval queries. The framework mainly includes four pivotal modules. Specifically, we introduce the Relative Caption-aware Consistency (RCC) constraint to align text pieces and images in the database, which can effectually bridge the heterogeneous gap. The Multi-modal Complementary Fusion (MCF) and Cross-modal Guided Pooling (CGP) are constructed to mine multiple interactions between image local features and text word features and learn the complementary representation of the composed query. Furthermore, we develop a plug-and-play Weak-text Semantic Augment (WSA) module for datasets with short or incomplete query texts, which can supplement the weak-text features and is conducive to modeling an augmented semantic space. Extensive experiments demonstrate the practical superior performance over the existing state-of-the-art empirical algorithms on several benchmarks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.