AMOM: Adaptive Masking over Masking for Conditional Masked Language Model

Yisheng Xiao,Min Zhang,Tie-Yan Liu,Tao Qin,Ruiyang Xu,Juntao Li,Lijun Wu

doi:10.1609/aaai.v37i11.26615

Abstract

Transformer-based autoregressive (AR) methods have achieved appealing performance for varied sequence-to-sequence generation tasks, e.g., neural machine translation, summarization, and code generation, but suffer from low inference efficiency. To speed up the inference stage, many non-autoregressive (NAR) strategies have been proposed in the past few years. Among them, the conditional masked language model (CMLM) is one of the most versatile frameworks, as it can support many different sequence generation scenarios and achieve very competitive performance on these tasks. In this paper, we further introduce a simple yet effective adaptive masking over masking strategy to enhance the refinement capability of the decoder and make the encoder optimization easier. Experiments on 3 different tasks (neural machine translation, summarization, and code generation) with 15 datasets in total confirm that our proposed simple method achieves significant performance improvement over the strong CMLM model. Surprisingly, our proposed model yields state-of-the-art performance on neural machine translation (34.62 BLEU on WMT16 EN to RO, 34.82 BLEU on WMT16 RO to EN, and 34.84 BLEU on IWSLT De to En) and even better performance than the AR Transformer on 7 benchmark datasets with at least 2.2x speedup. Our code is available at GitHub.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

AMOM: Adaptive Masking over Masking for Conditional Masked Language Model

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jun 26, 2023
Citations: 2

Similar Papers

Research on the Construction of a Bidirectional Neural Network Machine Translation Model Fused with Attention Mechanism
Guangming Zuo
Mathematical Problems in Engineering | VOL. 2022
Guangming ZuoGuangming Zuo
19 Aug 2022
Mathematical Problems in Engineering | VOL. 2022

Universal Conditional Masked Language Pre-training for Neural Machine Translation
...
-
, et. al. ...
11 May 2022
11 May 2022

Universal Conditional Masked Language Pre-training for Neural Machine Translation
Pengfei Li ... Minghao Wu
-
Pengfei Li, et. al.Pengfei Li ... Minghao Wu
01 Jan 2021
01 Jan 2021

Confidence Based Bidirectional Global Context Aware Training Framework for Neural Machine Translation
...
-
, et. al. ...
11 May 2022
11 May 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AMOM: Adaptive Masking over Masking for Conditional Masked Language Model

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence