Abstract
Parameter efficient transfer learning (PETL) is an emerging research spot that aims to adapt large-scale pre-trained models to downstream tasks. Recent advances have achieved great success in saving storage and computation costs. However, these methods do not take into account instance-specific visual clues for visual tasks. In this paper, we propose a Dynamic Visual Prompt Tuning framework (DVPT), which can generate a dynamic instance-wise token for each image. In this way, it can capture the unique visual feature of each image, which can be more suitable for downstream visual tasks. We designed a Meta-Net module that can generate learnable prompts based on each image, thereby capturing dynamic instance-wise visual features. Extensive experiments on a wide range of downstream recognition tasks show that DVPT achieves superior performance than other PETL methods. More importantly, DVPT even outperforms full fine-tuning on 17 out of 19 downstream tasks while maintaining high parameter efficiency. Our code will be released soon.
Full Text
Topics from this Paper
Downstream Tasks
Dynamic Visual Features
Visual Feature
Visual Tasks
Computation Costs
+ Show 5 more
Create a personalized feed of these topics
Get StartedSimilar Papers
Archives of Computational Methods in Engineering
Jan 20, 2023
May 1, 2021
Jan 1, 2009
May 1, 2021
Applied Sciences
Mar 23, 2023
arXiv: Computer Vision and Pattern Recognition
Apr 22, 2020
IEEE Transactions on Image Processing
Jan 1, 2021
Proceedings of the AAAI Conference on Artificial Intelligence
Jun 28, 2022
Jun 4, 2023
Aug 1, 2021
Jan 1, 2021