PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized Embedding Models

Eyal Ben-David,Roi Reichart,Carmel Rabinovitz

doi:10.1162/tacl_a_00328

Abstract

Pivot-based neural representation models have led to significant progress in domain adaptation for NLP. However, previous research following this approach utilize only labeled data from the source domain and unlabeled data from the source and target domains, but neglect to incorporate massive unlabeled corpora that are not necessarily drawn from these domains. To alleviate this, we propose PERL: A representation learning model that extends contextualized word embedding models such as BERT (Devlin et al., 2019 ) with pivot-based fine-tuning. PERL outperforms strong baselines across 22 sentiment classification domain adaptation setups, improves in-domain model performance, yields effective reduced-size models, and increases model stability.1

Highlights

Natural Language Processing (NLP) algorithms are constantly improving, gradually approaching human level performance (Dozat and Manning, 2017; Edunov et al, 2018; Radford et al, 2018)
We further present regularized PERL (R-PERL) which facilitates parameter sharing for pivots with similar meaning
We further present a variant of PERL, denoted with R-PERL, where the non-contextualized embedding matrix of the BERT model trained at Step (1) is employed in order to regularize PERL during its fine-tuning stage (Step 2)

Summary

Introduction

Natural Language Processing (NLP) algorithms are constantly improving, gradually approaching human level performance (Dozat and Manning, 2017; Edunov et al, 2018; Radford et al, 2018) Those algorithms often depend on the availability of large amounts of manually annotated data from the domain where the task is performed. Our focus in this paper is on unsupervised DA, the setup we consider most realistic In this setup labeled data is available only from the source domain while unlabeled data is available from both the source and the target domains. A common pipeline for this setup consists of two steps: (A) Learning a representation model (often referred to as the encoder) using the source and target unlabeled data; and (B) Training a supervised classifier on the source domain labeled data. This is performed both when the classifier is trained in the source domain and when it is applied to new text from the target domain

Objectives

Methods

Results

Conclusion