FashionERN: Enhance-and-Refine Network for Composed Fashion Image Retrieval

Yanzhe Chen,Jiahuan Zhou,Lele Cheng,Xiangteng He,Huasong Zhong,Yuxin Peng

doi:10.1609/aaai.v38i2.27885

Abstract

The goal of composed fashion image retrieval is to locate a target image based on a reference image and modified text. Recent methods utilize symmetric encoders (e.g., CLIP) pre-trained on large-scale non-fashion datasets. However, the input for this task exhibits an asymmetric nature, where the reference image contains rich content while the modified text is often brief. Therefore, methods employing symmetric encoders encounter a severe phenomenon: retrieval results dominated by reference images, leading to the oversight of modified text. We propose a Fashion Enhance-and-Refine Network (FashionERN) centered around two aspects: enhancing the text encoder and refining visual semantics. We introduce a Triple-branch Modifier Enhancement model, which injects relevant information from the reference image and aligns the modified text modality with the target image modality. Furthermore, we propose a Dual-guided Vision Refinement model that retains critical visual information through text-guided refinement and self-guided refinement processes. The combination of these two models significantly mitigates the reference dominance phenomenon, ensuring accurate fulfillment of modifier requirements. Comprehensive experiments demonstrate our approach's state-of-the-art performance on four commonly used datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

FashionERN: Enhance-and-Refine Network for Composed Fashion Image Retrieval

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

Improving Reference-based Distinctive Image Captioning with Contrastive Rewards
Yangjun Mao ... Jun Xiao
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. -
Yangjun Mao, et. al.Yangjun Mao ... Jun Xiao
24 Sep 2024
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. -

<title>Simulated annealing optimization in chamfer matching</title>
Terence K L Goh
-
Terence K L GohTerence K L Goh
29 Oct 1996
29 Oct 1996

Rethinking the Reference-based Distinctive Image Captioning
Yangjun Mao ... Jian Shao
-
Yangjun Mao, et. al.Yangjun Mao ... Jian Shao
10 Oct 2022
10 Oct 2022

Cloud and cloud shadow removal of landsat 8 images using Multitemporal Cloud Removal method
Danang Surya Candra ... Peter Scarth
-
Danang Surya Candra, et. al.Danang Surya Candra ... Peter Scarth
01 Aug 2017
01 Aug 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

FashionERN: Enhance-and-Refine Network for Composed Fashion Image Retrieval

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence