CPT: Colorful Prompt Tuning for pre-trained vision-language models

Yuan Yao,Ao Zhang,Zhengyan Zhang,Zhiyuan Liu,Tat-Seng Chua,Maosong Sun

doi:10.1016/j.aiopen.2024.01.004

Abstract

Vision-Language Pre-training (VLP) models have shown promising capabilities in grounding natural language in image data, facilitating a broad range of cross-modal tasks. However, we note that there exists a significant gap between the objective forms of model pre-training and fine-tuning, resulting in a need for large amounts of labeled data to stimulate the visual grounding capability of VLP models for downstream tasks. To address the challenge, we present Color-based Prompt Tuning (CPT), a novel paradigm for tuning VLP models, which reformulates visual grounding into a fill-in-the-blank problem with color-based co-referential markers in image and text, maximally mitigating the gap. In this way, CPT enables strong few-shot and even zero-shot visual grounding capabilities of VLP models. Comprehensive experimental results show that CPT achieves state-of-the-art performance on zero/few-shot visual grounding (e.g., 75.1 zero-shot accuracy in RefCOCO evaluation), outperforming fine-tuned and other prompt-tuned models by a large margin. Moreover, CPT can also be easily extended to achieve promising zero/few-shot performance on other vision-language tasks, such as visual relation detection, visual commonsense reasoning and visual question answering. We make the data and codes publicly available at https://github.com/thunlp/CPT.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: AI Open	Publication Date: Jan 1, 2024
Citations: 20	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

CPT: Colorful Prompt Tuning for pre-trained vision-language models

Abstract

Talk to us

Similar Papers

More From: AI Open

Lead the way for us

Similar Papers

Large language models in law: A survey
Jinqi Lai ... Philip S Yu
AI Open | VOL. -
Jinqi Lai, et. al.Jinqi Lai ... Philip S Yu
01 Oct 2024
AI Open | VOL. -

Generating graph perturbations to enhance the generalization of GNNs
Sofiane Ennadir ... Henrik Boström
AI Open | VOL. -
Sofiane Ennadir, et. al.Sofiane Ennadir ... Henrik Boström
01 Oct 2024
AI Open | VOL. -

WITHDRAWN: Evaluating the generation capabilities of large Chinese language models
Hui Zeng ... Na Zhang
AI Open | VOL. -
Hui Zeng, et. al.Hui Zeng ... Na Zhang
01 Mar 2024
AI Open | VOL. -

Boosting graph search with attention network for solving the general orienteering problem
Zongtao Liu ... Yang Yang
AI Open | VOL. -
Zongtao Liu, et. al.Zongtao Liu ... Yang Yang
01 Feb 2024
AI Open | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CPT: Colorful Prompt Tuning for pre-trained vision-language models

Abstract

Talk to us

Similar Papers

More From: AI Open