Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction

Masahiro Kaneko,Shun Kiyono,Kentaro Inui,Jun Suzuki,Masato Mita

doi:10.18653/v1/2020.acl-main.391

Abstract

This paper investigates how to effectively incorporate a pre-trained masked language model (MLM), such as BERT, into an encoder-decoder (EncDec) model for grammatical error correction (GEC). The answer to this question is not as straightforward as one might expect because the previous common methods for incorporating a MLM into an EncDec model have potential drawbacks when applied to GEC. For example, the distribution of the inputs to a GEC model can be considerably different (erroneous, clumsy, etc.) from that of the corpora used for pre-training MLMs; however, this issue is not addressed in the previous methods. Our experiments show that our proposed method, where we first fine-tune a MLM with a given GEC corpus and then use the output of the fine-tuned MLM as additional features in the GEC model, maximizes the benefit of the MLM. The best-performing model achieves state-of-the-art performances on the BEA-2019 and CoNLL-2014 benchmarks. Our code is publicly available at: https://github.com/kanekomasahiro/bert-gec.

Highlights

Grammatical Error Correction (GEC) is a sequenceto-sequence task where a model corrects an ungrammatical sentence to a grammatical sentence
Our experiments show that using the output of the fine-tuned BERT model as additional features in the GEC model (method (c)) is the most effective way of using BERT in most of the GEC corpora that we used in the experiments
For the BERT initialized GEC model, we provided experiments based on the open-source code2

Summary

Introduction

Grammatical Error Correction (GEC) is a sequenceto-sequence task where a model corrects an ungrammatical sentence to a grammatical sentence. We employ BERT, which is a widely used MLM (Qiu et al, 2020), and evaluate the following three methods: (a) initialize an EncDec GEC model using pre-trained BERT as in Lample and Conneau (2019) (BERT-init), (b) pass the output of pre-trained BERT into the EncDec GEC model as additional features (BERTfuse) (Zhu et al, 2020), and (c) combine the best parts of (a) and (b). In this new method (c), we first fine-tune BERT with the GEC corpus and use the output of the fine-tuned BERT model as additional features in the GEC model. The best-performing model achieves state-of-the-art results on the BEA-2019 and CoNLL-2014 benchmarks

Related Work

Methods for Using Pre-trained MLM in GEC Model

BERT-init

BERT-fuse

BERT-fuse Mask and GED

Evaluating GEC Performance

Train and Development Sets

Models

Pseudo-data

Results

Hidden Representation Visualization

Performance for Each Error Type

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 117	License type: cc-by

Similar Papers

Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction
Masahiro Kaneko
Journal of Natural Language Processing | VOL. 27
Masahiro KanekoMasahiro Kaneko
15 Sep 2020
Journal of Natural Language Processing | VOL. 27

A Hybrid System for Chinese Grammatical Error Diagnosis and Correction
Chen Li ... Zuyi Bao
-
Chen Li, et. al.Chen Li ... Zuyi Bao
01 Jan 2018
01 Jan 2018

Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases

-

01 Aug 2021
01 Aug 2021

Cross-Corpora Evaluation and Analysis of Grammatical Error Correction Models — Is Single-Corpus Evaluation Enough?
Masato Mita ... Tomoya Mizumoto
-
Masato Mita, et. al.Masato Mita ... Tomoya Mizumoto
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction

Abstract

Highlights

Summary

Talk to us

Similar Papers