To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks

Matthew E Peters,Noah A Smith,Sebastian Ruder

doi:10.18653/v1/w19-4302

Abstract

While most previous work has focused on different pretraining objectives and architectures for transfer learning, we ask how to best adapt the pretrained model to a given target task. We focus on the two most common forms of adaptation, feature extraction (where the pretrained weights are frozen), and directly fine-tuning the pretrained model. Our empirical results across diverse NLP tasks with two state-of-the-art models show that the relative performance of fine-tuning vs. feature extraction depends on the similarity of the pretraining and target tasks. We explore possible explanations for this finding and provide a set of adaptation guidelines for the NLP practitioner.

Highlights

Sequential inductive transfer learning (Pan and Yang, 2010; Ruder, 2019) consists of two stages: pretraining, in which the model learns a generalpurpose representation of inputs, and adaptation, in which the representation is transferred to a new task
We evaluate on a diverse set of target tasks: named entity recognition (NER), sentiment analysis (SA), and three sentence pair tasks, natural language inference (NLI), paraphrase detection (PD), and semantic textual similarity (STS)
For ELMo, we find the largest differences for sentence pair tasks where consistently outperforms

Summary

Introduction

Sequential inductive transfer learning (Pan and Yang, 2010; Ruder, 2019) consists of two stages: pretraining, in which the model learns a generalpurpose representation of inputs, and adaptation, in which the representation is transferred to a new task. Most previous work in NLP has focused on pretraining objectives for learning word or sentence representations (Mikolov et al, 2013; Kiros et al, 2015). In feature extraction ( ) the model’s weights are ‘frozen’ and the pretrained representations are used in a downstream model similar to classic feature-based approaches (Koehn et al, 2003). A pretrained model’s parameters can be unfrozen and fine-tuned ( ) on a new task (Dai and Le, 2015). Both have benefits: enables use of task-specific model architectures and may be

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2019
Citations: 351	License type: cc-by

Similar Papers

Different Strokes for Different Folks: Investigating Appropriate Further Pre-training Approaches for Diverse Dialogue Tasks
Yao Qiu ... Jinchao Zhang
-
Yao Qiu, et. al.Yao Qiu ... Jinchao Zhang
01 Jan 2020
01 Jan 2020

Scientific Keyphrase Identification and Classification by Pre-Trained Language Models Intermediate Task Transfer Learning
Seoyeon Park ... Cornelia Caragea
-
Seoyeon Park, et. al.Seoyeon Park ... Cornelia Caragea
01 Jan 2020
01 Jan 2020

Scientific Keyphrase Identification and Classification by Pre-Trained Language Models Intermediate Task Transfer Learning
...
-
, et. al. ...
25 Nov 2020
25 Nov 2020

TransTailor: Pruning the Pre-trained Model for Improved Transfer Learning
Bingyan Liu ... Yifeng Cai
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35
Bingyan Liu, et. al.Bingyan Liu ... Yifeng Cai
18 May 2021
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks

Abstract

Highlights

Summary

Talk to us

Similar Papers