Compound Text-Guided Prompt Tuning via Image-Adaptive Cues

Hao Tan,Jun Wan,Yizhuang Zhou,Zhen Lei,Xiangyu Zhang,Jun Li

doi:10.1609/aaai.v38i5.28311

Abstract

Vision-Language Models (VLMs) such as CLIP have demonstrated remarkable generalization capabilities to downstream tasks. However, existing prompt tuning based frameworks need to parallelize learnable textual inputs for all categories, suffering from massive GPU memory consumption when there is a large number of categories in the target dataset. Moreover, previous works require to include category names within prompts, exhibiting subpar performance when dealing with ambiguous category names. To address these shortcomings, we propose Compound Text-Guided Prompt Tuning (TGP-T) that significantly reduces resource demand while achieving superior performance. We introduce text supervision to the optimization of prompts, which enables two benefits: 1) releasing the model reliance on the pre-defined category names during inference, thereby enabling more flexible prompt generation; 2) reducing the number of inputs to the text encoder, which decreases GPU memory consumption significantly. Specifically, we found that compound text supervisions, i.e., category-wise and content-wise, is highly effective, since they provide inter-class separability and capture intra-class variations, respectively. Moreover, we condition the prompt generation on visual features through a module called Bonder, which facilitates the alignment between prompts and visual features. Extensive experiments on few-shot recognition and domain generalization demonstrate that TGP-T achieves superior performance with consistently lower training costs. It reduces GPU memory usage by 93% and attains a 2.5% performance gain on 16-shot ImageNet. The code is available at https://github.com/EricTan7/TGP-T.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Compound Text-Guided Prompt Tuning via Image-Adaptive Cues

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

TBEM: Testing-Based GPU-Memory Consumption Estimation for Deep Learning
Haiyi Liu ... Shaoying Liu
IEEE Access | VOL. 10
Haiyi Liu, et. al.Haiyi Liu ... Shaoying Liu
01 Jan 2021
IEEE Access | VOL. 10

Estimating GPU memory consumption of deep learning models
Yanjie Gao ... Haoxiang Lin
-
Yanjie Gao, et. al.Yanjie Gao ... Haoxiang Lin
08 Nov 2020
08 Nov 2020

ElasticPipe
Jinkun Geng ... Dan Li
-
Jinkun Geng, et. al.Jinkun Geng ... Dan Li
17 Jun 2019
17 Jun 2019

A geometry-guided multi-beamlet deep learning technique for CT reconstruction
Ke Lu ... Lei Ren
Biomedical Physics & Engineering Express | VOL. 8
Ke Lu, et. al.Ke Lu ... Lei Ren
13 May 2022
Biomedical Physics & Engineering Express | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Compound Text-Guided Prompt Tuning via Image-Adaptive Cues

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence