Do you have the right scissors? Tailoring Pre-trained Language Models via Monte-Carlo Methods

Ning Miao,Hao Zhou,Lei Li,Yuxuan Song

doi:10.18653/v1/2020.acl-main.314

Abstract

It has been a common approach to pre-train a language model on a large corpus and fine-tune it on task-specific data. In practice, we observe that fine-tuning a pre-trained model on a small dataset may lead to over- and/or under-estimate problem. In this paper, we propose MC-Tailor, a novel method to alleviate the above issue in text generation tasks by truncating and transferring the probability mass from over-estimated regions to under-estimated ones. Experiments on a variety of text generation datasets show that MC-Tailor consistently and significantly outperforms the fine-tuning approach.

Highlights

Pre-trained language models (PLM), e.g. GPT-2 (Radford et al, 2019), have shown great promise in many applications of natural language generation, such as stylized text generation (Syed et al, 2019) and dialog system (Wolf et al, 2019)
MC-Tailor is composed of a ratio estimator, which detects overand under-estimate regions of model distributions, and the Early Rejection Sampling algorithm (ERS), which accelerates sampling while ensuring sample quality
We find that MC-Tailor significantly reduces Rev-PPLs than fine-tuning baseline in data sets of different sizes, from Ontonotes-mz with only 7k training samples to relatively large Switchboard data set with more than 200k samples

Summary

Introduction

Pre-trained language models (PLM), e.g. GPT-2 (Radford et al, 2019), have shown great promise in many applications of natural language generation, such as stylized text generation (Syed et al, 2019) and dialog system (Wolf et al, 2019). Given a pre-trained GPT-2 model, to generate sentences of email domain, we always need to fine-tune the GPT-2 on a small set of email domain corpus. Case of fine-tuning on small datasets, which always leads to the mismatch problem of the real and model distributions. To address the over- and under-estimated problem, in this paper, we propose MC-Tailor, which can tailor the resulting density of model distribution by cutting the probability mass of over-estimated zones to under-estimated zones, leading to more realistic model distribution after fine-tuning. C 2020 Association for Computational Linguistics estimated regions of model distribution; and an early rejection sampling (ERS) component to tailor (reassign) probability mass and efficiently obtain sampled sentences from the model distribution. Empirical results show that MC-Tailor can generate significantly better samples than finetuning, and the resulting model distributions of our model are closer to real data distributions

Pre-Trained Language Model

Proposed MC-Tailor

Ratio Estimator

Efficient Sampling

Experimental Setup

Experimental Results

Methods

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Do you have the right scissors? Tailoring Pre-trained Language Models via Monte-Carlo Methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 13	License type: cc-by

Similar Papers

Arabic abstractive text summarization using RNN-based and transformer-based architectures
Mohammad Bani-Almarjeh ... Mohamad-Bassam Kurdy
Information Processing & Management | VOL. 60
Mohammad Bani-Almarjeh, et. al.Mohammad Bani-Almarjeh ... Mohamad-Bassam Kurdy
26 Dec 2022
Information Processing & Management | VOL. 60

Data-Efficient Information Extraction from Documents with Pre-trained Language Models
Clément Sage ... Christophe Garcia
-
Clément Sage, et. al.Clément Sage ... Christophe Garcia
01 Jan 2020
01 Jan 2020

Learning to generate text with auxiliary tasks
Pham Quoc-Hung ... Xuan-Hieu Phan
Knowledge-Based Systems | VOL. 304
Pham Quoc-Hung, et. al.Pham Quoc-Hung ... Xuan-Hieu Phan
01 Oct 2024
Knowledge-Based Systems | VOL. 304

Learning to Transfer Prompts for Text Generation
Junyi Li ... Tianyi Tang
-
Junyi Li, et. al.Junyi Li ... Tianyi Tang
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Do you have the right scissors? Tailoring Pre-trained Language Models via Monte-Carlo Methods

Abstract

Highlights

Summary

Talk to us

Similar Papers