F-SCP: An automatic prompt generation method for specific classes based on visual language pre-training models

Baihong Han,Xiaoyan Jiang,Zhijun Fang,Hamido Fujita,Yongbin Gao

doi:10.1016/j.patcog.2023.110096

Abstract

The zero-shot classification performance of large-scale vision-language pre-training models (e.g., CLIP, BLIP and ALIGN) can be enhanced by incorporating a prompt (e.g., “a photo of a [CLASS]”) before the class words. Modifying the prompt slightly can have significant effect on the classification outcomes of these models. Thus, it is crucial to include an appropriate prompt tailored to the classes. However, manual prompt design is labor-intensive and necessitates domain-specific expertise. The CoOp (Context Optimization) converts hand-crafted prompt templates into learnable word vectors to automatically generate prompts, resulting in substantial improvements for CLIP. However, CoOp exhibited significant variation in classification performance across different classes. Although CoOp-CSC (Class-Specific Context) has a separate prompt for each class, only shows some advantages on fine-grained datasets. In this paper, we propose a novel automatic prompt generation method called F-SCP (Filter-based Specific Class Prompt), which distinguishes itself from the CoOp-UC (Unified Context) model and the CoOp-CSC model. Our approach focuses on prompt generation for low-accuracy classes and similar classes. We add the Filter and SCP modules to the prompt generation architecture. The Filter module selects the poorly classified classes, and then reproduce the prompts through the SCP (Specific Class Prompt) module to replace the prompts of specific classes. Experimental results on six multi-domain datasets shows the superiority of our approach over the state-of-the-art methods. Particularly, the improvement in accuracy for the specific classes mentioned above is significant. For instance, compared with CoOp-UC on the OxfordPets dataset, the low-accuracy classes, such as, Class21 and Class26, are improved by 18% and 12%, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

F-SCP: An automatic prompt generation method for specific classes based on visual language pre-training models

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition

Lead the way for us

Journal: Pattern Recognition	Publication Date: Nov 2, 2023
Citations: 5

Similar Papers

Experimental Study on Improvement of Sign Language Motion Classification Performance Using Pre-trained Network Models
Kaito Kawaguchi ... Phaphimon Veerakiatikit
-
Kaito Kawaguchi, et. al.Kaito Kawaguchi ... Phaphimon Veerakiatikit
01 Jan 2020
01 Jan 2020

Pre-trained Language Model Based Tibetan Text Classification Method
Bo An Bo An ... Congjun Long Congjun Long
-
Bo An Bo An, et. al.Bo An Bo An ... Congjun Long Congjun Long
19 Jul 2022
19 Jul 2022

Identifying Images with Ladders Using Deep CNN Transfer Learning
Gaurav Pandey ... Arvind Baranwal
-
Gaurav Pandey, et. al.Gaurav Pandey ... Arvind Baranwal
17 Jul 2019
17 Jul 2019

An extensive study on pre-trained models for program understanding and generation
Zhengran Zeng ... Haotian Zhang
-
Zhengran Zeng, et. al.Zhengran Zeng ... Haotian Zhang
18 Jul 2022
18 Jul 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

F-SCP: An automatic prompt generation method for specific classes based on visual language pre-training models

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition