AMC: Adaptive Multi-expert Collaborative Network for Text-guided Image Retrieval

Hongguang Zhu,Shujuan Huang,Chunjie Zhang,Yunchao Wei,Yao Zhao

doi:10.1145/3584703

Abstract

Text-guided image retrieval integrates reference image and text feedback as a multimodal query to search the image corresponding to user intention. Recent approaches employ multi-level matching, multiple accesses, or multiple subnetworks for better performance regardless of the heavy burden of storage and computation in the deployment. Additionally, these models not only rely on expert knowledge to handcraft image-text composing modules but also do inference by the static computational graph. It limits the representation capability and generalization ability of networks in the face of challenges from complex and varied combinations of reference image and text feedback. To break the shackles of the static network concept, we introduce the dynamic router mechanism to achieve data-dependent expert activation and flexible collaboration of multiple experts to explore more implicit multimodal fusion patterns. Specifically, we construct AMC, our A daptive M ulti-expert C ollaborative network, by using the proposed router to activate the different experts with different levels of image-text interaction. Since routers can dynamically adjust the activation of experts for the current samples, AMC can achieve the adaptive fusion mode for the different reference image and text combinations and generate dynamic computational graphs according to varied multimodal queries. Extensive experiments on two benchmark datasets demonstrate that due to benefits from the image-text composing representation produced by an adaptive multi-expert collaboration mechanism, AMC has better retrieval performance and zero-shot generalization ability than the state-of-the-art method while keeping the lightweight model and fast retrieval speed. Moreover, we analyze the visualization of path activation, attention map, and retrieval results to further understand the routing decisions and semantic localization ability of AMC. The codes and pretrained models are available at https://github.com/KevinLight831/AMC .

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

AMC: Adaptive Multi-expert Collaborative Network for Text-guided Image Retrieval

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications

Lead the way for us

Journal: ACM Transactions on Multimedia Computing, Communications, and Applications	Publication Date: May 30, 2023
Citations: 4

Similar Papers

Image Search with Text Feedback by Deep Hierarchical Attention Mutual Information Maximization
Zhen Zhang ... Jiajun Bu
-
Zhen Zhang, et. al.Zhen Zhang ... Jiajun Bu
17 Oct 2021
17 Oct 2021

Image Search With Text Feedback by Visiolinguistic Attention Learning
Yanbei Chen ... Loris Bazzani
-
Yanbei Chen, et. al.Yanbei Chen ... Loris Bazzani
01 Jun 2020
01 Jun 2020

InterCLIP: Adapting CLIP To Interactive Image Retrieval with Triplet Similarity
Meina Song ... Zhonghong Ou
-
Meina Song, et. al.Meina Song ... Zhonghong Ou
26 Nov 2022
26 Nov 2022

An exploration of the uncertainty relation satisfied by BP network learning ability and generalization ability
Zuoyong Li
Science in China Series F | VOL. 47
Zuoyong LiZuoyong Li
01 Jan 2004
Science in China Series F | VOL. 47

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AMC: Adaptive Multi-expert Collaborative Network for Text-guided Image Retrieval

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications