Abstract

Chinese spelling correction (CSC) aims to automatically detect and correct spelling errors in Chinese sentences. Recently, the method that combines a pre-trained language model with external knowledge has achieved excellent performance. The knowledge is either derived from multi-modal information such as pronunciations and glyphs, or from a confusion set that collects confusing character pairs. However, existing advanced multi-modal knowledge based methods have superior performance at the cost of largely increased model size; and although context semantics is essential for CSC, current confusion set based methods fail to use the confusion set to model the semantics as they do not fuse the lexical feature. To deal with these issues, we propose an Adapter-based BERT-level Confusion Set Fusion method which fuses BERT with the semantics of confusing characters in the semantic encoding phase. A lightweight adapter is designed to be placed between BERT layers, which dynamically extracts the relevant knowledge among the confusing candidates and integrates it with the context. In this way, the contextual information and the semantics of the candidates can fully interact within BERT. Experiments11Code are available at https://github.com/jying2023/ABC-Fusion. are conducted on three benchmarks. The results demonstrate that our method outperforms the previous confusion set based methods and shows comparable performance with the multi-modal knowledge based methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call