Abstract

While most previous work has focused on different pretraining objectives and architectures for transfer learning, we ask how to best adapt the pretrained model to a given target task. We focus on the two most common forms of adaptation, feature extraction (where the pretrained weights are frozen), and directly fine-tuning the pretrained model. Our empirical results across diverse NLP tasks with two state-of-the-art models show that the relative performance of fine-tuning vs. feature extraction depends on the similarity of the pretraining and target tasks. We explore possible explanations for this finding and provide a set of adaptation guidelines for the NLP practitioner.

Highlights

  • Sequential inductive transfer learning (Pan and Yang, 2010; Ruder, 2019) consists of two stages: pretraining, in which the model learns a generalpurpose representation of inputs, and adaptation, in which the representation is transferred to a new task

  • We evaluate on a diverse set of target tasks: named entity recognition (NER), sentiment analysis (SA), and three sentence pair tasks, natural language inference (NLI), paraphrase detection (PD), and semantic textual similarity (STS)

  • For ELMo, we find the largest differences for sentence pair tasks where consistently outperforms

Read more

Summary

Introduction

Sequential inductive transfer learning (Pan and Yang, 2010; Ruder, 2019) consists of two stages: pretraining, in which the model learns a generalpurpose representation of inputs, and adaptation, in which the representation is transferred to a new task. Most previous work in NLP has focused on pretraining objectives for learning word or sentence representations (Mikolov et al, 2013; Kiros et al, 2015). In feature extraction ( ) the model’s weights are ‘frozen’ and the pretrained representations are used in a downstream model similar to classic feature-based approaches (Koehn et al, 2003). A pretrained model’s parameters can be unfrozen and fine-tuned ( ) on a new task (Dai and Le, 2015). Both have benefits: enables use of task-specific model architectures and may be

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.