Abstract

State-of-the-art sequence labeling systems require large amounts of task-specific knowledge in the form of handcrafted features and data pre-processing, and those systems are established on news corpus. English as second language (ESL) corpus is collected from articles written by English-learner. The corpus is full of grammatical mistakes, and then it is much more difficult to do sequence labeling. We propose a two-stage deep neural network architecture for sequence labeling, which enable the higher-layer to make use of the coarse-grained labeling information of the lower-level. We evaluate our model on three datasets for three sequence labeling tasks—Penn Treebank WSJ corpus for part-of-speech (POS) tagging, CoNLL 2003 corpus for named entity recognition (NER) and CoNLL 2013 corpus for grammatical error correction (GEC). We obtain state-of-the-art performance on three datasets—97.60% accuracy for POS tagging, 91.38% F1 for NER and 38% F1 for determiner error correction of GEC and 28.89% F1 for prepositional error correction of GEC. We also evaluate our system on ESL corpus PiGai for POS tagging and obtain 96.73% accuracy. The implementation of our network is publicly available.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.