Abstract

This paper introduces the DM_NLP team’s system for NLPTEA 2018 shared task of Chinese Grammatical Error Diagnosis (CGED), which can be used to detect and correct grammatical errors in texts written by Chinese as a Foreign Language (CFL) learners. This task aims at not only detecting four types of grammatical errors including redundant words (R), missing words (M), bad word selection (S) and disordered words (W), but also recommending corrections for errors of M and S types. We proposed a hybrid system including four models for this task with two stages: the detection stage and the correction stage. In the detection stage, we first used a BiLSTM-CRF model to tag potential errors by sequence labeling, along with some handcraft features. Then we designed three Grammatical Error Correction (GEC) models to generate corrections, which could help to tune the detection result. In the correction stage, candidates were generated by the three GEC models and then merged to output the final corrections for M and S types. Our system reached the highest precision in the correction subtask, which was the most challenging part of this shared task, and got top 3 on F1 scores for position detection of errors.

Highlights

  • More and more people are learning a second or third language as an interest, a career plus, or even a challenge to oneself

  • We proposed a hybrid system for the Chinese Grammatical Error Diagnosis (CGED) shared task this year, which contained two stages: the detection stage and the correction stage

  • We found that our Grammatical Error Correction (GEC) models can focus on different type of errors, as shown in the Table 6 on the official testing data of CGED 2018, which is denoted as ‘18-test’

Read more

Summary

Introduction

More and more people are learning a second or third language as an interest, a career plus, or even a challenge to oneself. Chinese is one of the oldest and most versatile languages in the world. It would be difficult to learn Chinese, because Chinese has a lot of differences from other languages. Chinese has neither the change of singular and plural, nor the tense change of the verb. It has quite flexible expressions and loose structural grammar. These traits bring a lot of trouble to CFL learners, so the demands for Chinese Grammatical Error Diagnosis (CGED) as well as Correction (CGEC) is growing rapidly. GEC for English has been studied for many years, with many shared tasks such as CoNLL-2013 (Ng et al, 2013) and CoNLL-2014 (Ng et al, 2014), while those kinds of studies on Chinese is less yet

Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call