Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning

Runxin Xu,Songfang Huang,Chuanqi Tan,Fuli Luo,Baobao Chang,Zhiyuan Zhang,Fei Huang

doi:10.18653/v1/2021.emnlp-main.749

Abstract

Recent pretrained language models extend from millions to billions of parameters. Thus the need to fine-tune an extremely large pretrained model with a limited training corpus arises in various downstream tasks. In this paper, we propose a straightforward yet effective fine-tuning technique, Child-Tuning, which updates a subset of parameters (called child network) of large pretrained models via strategically masking out the gradients of the non-child network during the backward process. Experiments on various downstream tasks in GLUE benchmark show that Child-Tuning consistently outperforms the vanilla fine-tuning by 1.5 8.6 average score among four different pretrained models, and surpasses the prior fine-tuning techniques by 0.6 1.3 points. Furthermore, empirical results on domain transfer and task transfer show that Child-Tuning can obtain better generalization performance by large margins.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2021
Citations: 30	License type: cc-by

Similar Papers

Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning
...
-
, et. al. ...
21 Oct 2021
21 Oct 2021

Expanding Large Pre-trained Unimodal Models with Multimodal Information Injection for Image-Text Multimodal Classification
Tao Liang ... Fengmao Lv
-
Tao Liang, et. al.Tao Liang ... Fengmao Lv
01 Jun 2022
01 Jun 2022

Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models
Yiwen Tang ... Bin Zhao
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38
Yiwen Tang, et. al.Yiwen Tang ... Bin Zhao
24 Mar 2024
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38

Regex-augmented Domain Transfer Topic Classification based on a Pre-trained Language Model: An application in Financial Domain
...
arXiv (Cornell University) | VOL. -
, et. al. ...
22 May 2023
arXiv (Cornell University) | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning

Abstract

Talk to us

Similar Papers