CLSpell: Contrastive learning with phonological and visual knowledge for chinese spelling check

Xingliang Mao,Youran Shan,Fangfang Li,Xiaohong Chen,Shichao Zhang

doi:10.1016/j.neucom.2023.126468

Abstract

The task of Chinese Spelling Check (CSC) is to identify and correct spelling errors in text, which are mainly caused by phonologically and visually similar characters. Although pre-trained language models are helpful for this task, they lack phonological and visual information. Previous works have primarily focused on identifying errors based on local contextual data, while neglecting the importance of sentence-level information. To address the above issues, Contrastive Learning Spell (CLSpell) is proposed, which combines phonetic and glyphic information through contrastive learning and simultaneously acquires local and global information through multi-task joint learning. During pretraining, token representations are learned using a combination of phonological, visual, and semantic information. Moreover, we propose to include an auxiliary task of correct sentence discrimination in the multi-task joint training process to capture sentence-level information. Experiments on widely used benchmarks demonstrate that the proposed method surpasses all competing methods.

Full Text