Self-Supervised Curriculum Learning for Spelling Error Correction

Zifa Gan,Hongying Zan,Hongfei Xu

doi:10.18653/v1/2021.emnlp-main.281

Abstract

Spelling Error Correction (SEC) that requires high-level language understanding is a challenging but useful task. Current SEC approaches normally leverage a pre-training then fine-tuning procedure that treats data equally. By contrast, Curriculum Learning (CL) utilizes training data differently during training and has shown its effectiveness in improving both performance and training efficiency in many other NLP tasks. In NMT, a model’s performance has been shown sensitive to the difficulty of training examples, and CL has been shown effective to address this. In SEC, the data from different language learners are naturally distributed at different difficulty levels (some errors made by beginners are obvious to correct while some made by fluent speakers are hard), and we expect that designing a curriculum correspondingly for model learning may also help its training and bring about better performance. In this paper, we study how to further improve the performance of the state-of-the-art SEC method with CL, and propose a Self-Supervised Curriculum Learning (SSCL) approach. Specifically, we directly use the cross-entropy loss as criteria for: 1) scoring the difficulty of training data, and 2) evaluating the competence of the model. In our approach, CL improves the model training, which in return improves the CL measurement. In our experiments on the SIGHAN 2015 Chinese spelling check task, we show that SSCL is superior to previous norm-based and uncertainty-aware approaches, and establish a new state of the art (74.38% F1).

Highlights

Being a very valuable natural language application, Spelling Error Correction (SEC) is a challenging task and needs high-level language understanding.Curriculum Learning (CL) (Bengio et al, 2009) facilitates model training in an easy-to-hard order
We study how to further improve the performance of the state-of-the-art SEC method with CL, and propose a Self-Supervised Curriculum Learning (SSCL) approach
In our experiments on the SIGHAN 2015 Chinese spelling check task, we show that Supervised CL (SSCL) is superior to previous norm-based and uncertainty-aware approaches, and establish a new state of the art (74.38% F1)

Summary

Introduction

Being a very valuable natural language application, SEC is a challenging task and needs high-level language understanding. SEC data difficulty is influenced by many factors, such as sentence length, word rarity and a great diversity of errors. Previous CL approaches require careful design for data difficulty and training curricula. We propose a novel Self-Supervised CL (SSCL) approach to evaluating data difficulty from the model’s perspective and automatically arranging curricula for the model. We propose a novel SSCL approach which avoids human design of CL measurements to improve the SOTA SEC model; Spelling Error Correction (SEC) aims to automatically correct the spelling errors in written text either at word-level or character-level (Yu and Li, 2014; Yu et al, 2014; Zhang et al, 2015; Wang et al, 2018; Hong et al, 2019; Wang et al, 2019a). 2: Compute data difficulty d( xn, yn ) N using the pre-trained system θ, Eq 1 and Eq 2.

Self-Supervised Curriculum Learning

Data Difficulty

Data Weight

Experiments

Model Competence

Method

Effects of Hyperparameter λs

Related Work

Findings

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Self-Supervised Curriculum Learning for Spelling Error Correction

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2021
Citations: 6	License type: cc-by

Similar Papers

Self-Supervised Curriculum Learning for Spelling Error Correction
...
-
, et. al. ...
15 Oct 2021
15 Oct 2021

Self-aware cycle curriculum learning for multiple-choice reading comprehension.
Haihong Chen ... Yufei Li
PeerJ. Computer science | VOL. 8
Haihong Chen, et. al.Haihong Chen ... Yufei Li
05 Dec 2022
PeerJ. Computer science | VOL. 8

Interactive curriculum learning increases and homogenizes motor smoothness
Vaynee Sungeelee ... Baptiste Caramiaux
Scientific Reports | VOL. 14
Vaynee Sungeelee, et. al.Vaynee Sungeelee ... Baptiste Caramiaux
03 Feb 2024
Scientific Reports | VOL. 14

Generalization Ability of Deep Reinforcement Learning-based Navigation Based on Curriculum Learning
Yuchen He ... Simone Baldi
Journal of Physics: Conference Series | VOL. 2593
Yuchen He, et. al.Yuchen He ... Simone Baldi
01 Sep 2023
Journal of Physics: Conference Series | VOL. 2593

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Self-Supervised Curriculum Learning for Spelling Error Correction

Abstract

Highlights

Summary

Talk to us

Similar Papers