Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order

Yi Liao,Qun Liu,Xin Jiang

doi:10.18653/v1/2020.acl-main.24

Abstract

Masked language model and autoregressive language model are two types of language models. While pretrained masked language models such as BERT overwhelm the line of natural language understanding (NLU) tasks, autoregressive language models such as GPT are especially capable in natural language generation (NLG). In this paper, we propose a probabilistic masking scheme for the masked language model, which we call probabilistically masked language model (PMLM). We implement a specific PMLM with a uniform prior distribution on the masking ratio named u-PMLM. We prove that u-PMLM is equivalent to an autoregressive permutated language model. One main advantage of the model is that it supports text generation in arbitrary order with surprisingly good quality, which could potentially enable new applications over traditional unidirectional generation. Besides, the pretrained u-PMLM also outperforms BERT on a set of downstream NLU tasks.

Highlights

Large-scale pretrained language models (Raffel et al, 2019; Wang et al, 2019; Lan et al, 2019; Liu et al, 2019; Jiao et al, 2019) have drawn lots of research attention as these models have brought significant improvements to many natural language understanding (NLU) and natural language generation (NLG) tasks
We prove that u-probabilistically masked language model (PMLM) learns an autoregressive language model on random permutations of training sequences
We prove that u-PMLM is equivalent to the autoregressive permutated language model (APLM) by recombination of the factorized log-likelihood function, which is basically the autoregressive language model trained on all possible permutations of the training instances: N

Summary

Introduction

Large-scale pretrained language models (Raffel et al, 2019; Wang et al, 2019; Lan et al, 2019; Liu et al, 2019; Jiao et al, 2019) have drawn lots of research attention as these models have brought significant improvements to many NLU and NLG tasks. Unlike predicting the masked tokens, the autoregressive language model learns a sequential generative process of text sequences. It naturally performs better for natural language generation. This is very challenging for conventional generation models since when predicting each word, the fluency and coherence of text are hard to be guaranteed given the contextual constraints on both sides. U-PMLM outperforms BERT significantly on the GLUE benchmark for natural language understanding

Transformer

Autoregressive Language Model

Masked Language Model

Probabilistically Masked Language Model

Model Formulation

Generation with u-PMLM

Training Settings

Comparative Models

Autoregressive Generation

Natural Language Understanding

Non-traditional Text Generation

Conclusion

A Proof of Equivalence

Findings

B Generation Examples of u-PMLM and BERT

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 34	License type: cc-by

Similar Papers

Masked Language Model Scoring
Toan Q Nguyen ... Katrin Kirchhoff
-
Toan Q Nguyen, et. al.Toan Q Nguyen ... Katrin Kirchhoff
01 Jan 2020
01 Jan 2020

Comparative Study of Multiclass Text Classification in Research Proposals Using Pretrained Language Models
Eunchan Lee ... Changhyeon Lee
Applied Sciences | VOL. 12
Eunchan Lee, et. al.Eunchan Lee ... Changhyeon Lee
29 Apr 2022
Applied Sciences | VOL. 12

Conditional BERT Contextual Augmentation
Jizhong Han ... Liangjun Zang
-
Jizhong Han, et. al.Jizhong Han ... Liangjun Zang
01 Jan 2019
01 Jan 2019

A Review of Current Trends, Techniques, and Challenges in Large Language Models (LLMs)
Rajvardhan Patil ... Venkat Gudivada
Applied Sciences | VOL. 14
Rajvardhan Patil, et. al.Rajvardhan Patil ... Venkat Gudivada
01 Mar 2024
Applied Sciences | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order

Abstract

Highlights

Summary

Talk to us

Similar Papers