Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval

Feifei Zhang,Changsheng Xu,Mingliang Xu

doi:10.1145/3478642

Abstract

Composing Text and Image to Image Retrieval ( CTI-IR ) is an emerging task in computer vision, which allows retrieving images relevant to a query image with text describing desired modifications to the query image. Most conventional cross-modal retrieval approaches usually take one modality data as the query to retrieve relevant data of another modality. Different from the existing methods, in this article, we propose an end-to-end trainable network for simultaneous image generation and CTI-IR . The proposed model is based on Generative Adversarial Network (GAN) and enjoys several merits. First, it can learn a generative and discriminative feature for the query (a query image with text description) by jointly training a generative model and a retrieval model. Second, our model can automatically manipulate the visual features of the reference image in terms of the text description by the adversarial learning between the synthesized image and target image. Third, global-local collaborative discriminators and attention-based generators are exploited, allowing our approach to focus on both the global and local differences between the query image and the target image. As a result, the semantic consistency and fine-grained details of the generated images can be better enhanced in our model. The generated image can also be used to interpret and empower our retrieval model. Quantitative and qualitative evaluations on three benchmark datasets demonstrate that the proposed algorithm performs favorably against state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications

Lead the way for us

Journal: ACM Transactions on Multimedia Computing, Communications, and Applications	Publication Date: Mar 4, 2022
Citations: 12

Similar Papers

Statistical distributional approach for scale and rotation invariant color image retrieval using multivariate parametric tests and orthogonality condition
K Seetharaman ... M Jeyakarthic
Journal of Visual Communication and Image Representation | VOL. 25
K Seetharaman, et. al.K Seetharaman ... M Jeyakarthic
17 Jan 2014
Journal of Visual Communication and Image Representation | VOL. 25

A survey on generative adversarial networks for imbalance problems in computer vision tasks
Vignesh Sampath ... Aitor Gutierrez
Journal of Big Data | VOL. 8
Vignesh Sampath, et. al.Vignesh Sampath ... Aitor Gutierrez
29 Jan 2021
Journal of Big Data | VOL. 8

Color Image Retrieval Based on Non-Parametric Statistical Tests of Hypothesis
Shekhar R ... Seetharaman K
-
Shekhar R, et. al.Shekhar R ... Seetharaman K
13 Sep 2014
13 Sep 2014

Separating Content from Style Using Adversarial Learning for Recognizing Text in the Wild
Canjie Luo ... Lianwen Jin
International Journal of Computer Vision | VOL. 129
Canjie Luo, et. al.Canjie Luo ... Lianwen Jin
05 Jan 2021
International Journal of Computer Vision | VOL. 129

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications