Abstract
It has been a common approach to pre-train a language model on a large corpus and fine-tune it on task-specific data. In practice, we observe that fine-tuning a pre-trained model on a small dataset may lead to over- and/or under-estimate problem. In this paper, we propose MC-Tailor, a novel method to alleviate the above issue in text generation tasks by truncating and transferring the probability mass from over-estimated regions to under-estimated ones. Experiments on a variety of text generation datasets show that MC-Tailor consistently and significantly outperforms the fine-tuning approach.
Highlights
Pre-trained language models (PLM), e.g. GPT-2 (Radford et al, 2019), have shown great promise in many applications of natural language generation, such as stylized text generation (Syed et al, 2019) and dialog system (Wolf et al, 2019)
MC-Tailor is composed of a ratio estimator, which detects overand under-estimate regions of model distributions, and the Early Rejection Sampling algorithm (ERS), which accelerates sampling while ensuring sample quality
We find that MC-Tailor significantly reduces Rev-PPLs than fine-tuning baseline in data sets of different sizes, from Ontonotes-mz with only 7k training samples to relatively large Switchboard data set with more than 200k samples
Summary
Pre-trained language models (PLM), e.g. GPT-2 (Radford et al, 2019), have shown great promise in many applications of natural language generation, such as stylized text generation (Syed et al, 2019) and dialog system (Wolf et al, 2019). Given a pre-trained GPT-2 model, to generate sentences of email domain, we always need to fine-tune the GPT-2 on a small set of email domain corpus. Case of fine-tuning on small datasets, which always leads to the mismatch problem of the real and model distributions. To address the over- and under-estimated problem, in this paper, we propose MC-Tailor, which can tailor the resulting density of model distribution by cutting the probability mass of over-estimated zones to under-estimated zones, leading to more realistic model distribution after fine-tuning. C 2020 Association for Computational Linguistics estimated regions of model distribution; and an early rejection sampling (ERS) component to tailor (reassign) probability mass and efficiently obtain sampled sentences from the model distribution. Empirical results show that MC-Tailor can generate significantly better samples than finetuning, and the resulting model distributions of our model are closer to real data distributions
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.