Abstract

Spelling check for Chinese has more challenging difficulties than that for other languages. A hybrid model for Chinese spelling check is presented in this article. The hybrid model consists of three components: one graph-based model for generic errors and two independently trained models for specific errors. In the graph model, a directed acyclic graph is generated for each sentence, and the single-source shortest-path algorithm is performed on the graph to detect and correct general spelling errors at the same time. Prior to that, two types of errors over functional words (characters) are first solved by conditional random fields: the confusion of “在” ( at ) (pinyin is zai in Chinese), “再” ( again , more , then ) (pinyin: zai ) and “的” ( of ) (pinyin: de ), “地” (- ly , adverb-forming particle) (pinyin: de ), and “得” ( so that , have to ) (pinyin: de ). Finally, a rule-based model is exploited to distinguish pronoun usage confusion: “她” ( she ) (pinyin: ta ), “他” ( he ) (pinyin: ta ), and some other common collocation errors. The proposed model is evaluated on the standard datasets released by the SIGHAN Bake-off shared tasks, giving state-of-the-art results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.