General and Domain-adaptive Chinese Spelling Check with Error-consistent Pretraining

Qi Lv,Guohong Fu,Lei Geng,Chunhui Ai,Xu Yan,Ziqiang Cao

doi:10.1145/3564271

Abstract

The lack of label data is one of the significant bottlenecks for Chinese Spelling Check. Existing researches use the automatic generation method by exploiting unlabeled data to expand the supervised corpus. However, there is a big gap between the real input scenario and automatically generated corpus. Thus, we develop a competitive general speller ECSpell, which adopts the Error-consistent masking strategy to create data for pretraining. This error-consistency masking strategy is used to specify the error types of automatically generated sentences consistent with the real scene. The experimental result indicates that our model outperforms previous state-of-the-art models on the general benchmark. Moreover, spellers often work within a particular domain in real life. Due to many uncommon domain terms, experiments on our built domain-specific datasets show that general models perform terribly. Inspired by the common practice of input methods, we propose to add an alterable user dictionary to handle the zero-shot domain-adaption problem. Specifically, we attach a User Dictionary guided inference module (UD) to a general token classification-based speller. Our experiments demonstrate that ECSpell UD , namely, ECSpell combined with UD, surpasses all the other baselines broadly, even approaching the performance on the general benchmark. 1

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

General and Domain-adaptive Chinese Spelling Check with Error-consistent Pretraining

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: May 9, 2023
Citations: 1

Similar Papers

Visual and Phonological Feature Enhanced Siamese BERT for Chinese Spelling Error Correction
Tiejun Wang ... Shuai Wang
Applied Sciences | VOL. 12
Tiejun Wang, et. al.Tiejun Wang ... Shuai Wang
30 Apr 2022
Applied Sciences | VOL. 12

Document Recognition and XML Generation of Tabular Form Discharge Summaries for Analogous Case Search System
S Tsuruoka ... K Yamamoto
Methods of Information in Medicine | VOL. 46
S Tsuruoka, et. al.S Tsuruoka ... K Yamamoto
01 Jan 2007
Methods of Information in Medicine | VOL. 46

A Hybrid Approach to Automatic Corpus Generation for Chinese Spelling Check
Haisong Zhang ... Jialong Han
-
Haisong Zhang, et. al.Haisong Zhang ... Jialong Han
01 Jan 2018
01 Jan 2018

Simulation of underwater vehicle control based on code generation technology
Liwen Kui ... Lixin Chang
-
Liwen Kui, et. al.Liwen Kui ... Lixin Chang
26 Mar 2021
26 Mar 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

General and Domain-adaptive Chinese Spelling Check with Error-consistent Pretraining

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing