Abstract

The task of Chinese Spelling Check (CSC) is to identify and correct spelling errors in text, which are mainly caused by phonologically and visually similar characters. Although pre-trained language models are helpful for this task, they lack phonological and visual information. Previous works have primarily focused on identifying errors based on local contextual data, while neglecting the importance of sentence-level information. To address the above issues, Contrastive Learning Spell (CLSpell) is proposed, which combines phonetic and glyphic information through contrastive learning and simultaneously acquires local and global information through multi-task joint learning. During pretraining, token representations are learned using a combination of phonological, visual, and semantic information. Moreover, we propose to include an auxiliary task of correct sentence discrimination in the multi-task joint training process to capture sentence-level information. Experiments on widely used benchmarks demonstrate that the proposed method surpasses all competing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call