P-Laplacian Adaptation for Generative Pre-trained Vision-Language Models

Haoyuan Wu,Xufeng Yao,Peng Xu,Peiyu Liao,Xinyun Zhang,Bei Yu

doi:10.1609/aaai.v38i6.28415

Abstract

Vision-Language models (VLMs) pre-trained on large corpora have demonstrated notable success across a range of downstream tasks. In light of the rapidly increasing size of pre-trained VLMs, parameter-efficient transfer learning (PETL) has garnered attention as a viable alternative to full fine-tuning. One such approach is the adapter, which introduces a few trainable parameters into the pre-trained models while preserving the original parameters during adaptation. In this paper, we present a novel modeling framework that recasts adapter tuning after attention as a graph message passing process on attention graphs, where the projected query and value features and attention matrix constitute the node features and the graph adjacency matrix, respectively. Within this framework, tuning adapters in VLMs necessitates handling heterophilic graphs, owing to the disparity between the projected query and value space. To address this challenge, we propose a new adapter architecture, p-adapter, which employs p-Laplacian message passing in Graph Neural Networks (GNNs). Specifically, the attention weights are re-normalized based on the features, and the features are then aggregated using the calibrated attention matrix, enabling the dynamic exploitation of information with varying frequencies in the heterophilic attention graphs. We conduct extensive experiments on different pre-trained VLMs and multi-modal tasks, including visual question answering, visual entailment, and image captioning. The experimental results validate our method's significant superiority over other PETL methods. Our code is available at https://github.com/wuhy68/p-Adapter/.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

P-Laplacian Adaptation for Generative Pre-trained Vision-Language Models

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

Cross-modal representation learning and generation
Huafeng Liu ... Zechao Li
Journal of Image and Graphics | VOL. 28
Huafeng Liu, et. al.Huafeng Liu ... Zechao Li
01 Jan 2023
Journal of Image and Graphics | VOL. 28

Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou ... Lei Zhang
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34
Luowei Zhou, et. al.Luowei Zhou ... Lei Zhang
03 Apr 2020
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34

Advancing Accuracy in Multimodal Medical Tasks Through Bootstrapped Language-Image Pretraining (BioMedBLIP): Performance Evaluation Study.
Usman Naseem ... Anum Masood
JMIR medical informatics | VOL. 12
Usman Naseem, et. al.Usman Naseem ... Anum Masood
05 Aug 2024
JMIR medical informatics | VOL. 12

Large-Scale Pretraining Improves Sample Efficiency of Active Learning-Based Virtual Screening.
Zhonglin Cao ... Ye Wang
Journal of chemical information and modeling | VOL. 64
Zhonglin Cao, et. al.Zhonglin Cao ... Ye Wang
05 Mar 2024
Journal of chemical information and modeling | VOL. 64

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

P-Laplacian Adaptation for Generative Pre-trained Vision-Language Models

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence