Abstract

AbstractThis paper surveys methods for annotation error detection and correction. Methods can broadly be characterized as to whether they detect inconsistencies with respect to some statistical model based only on the corpus data or whether they detect inconsistencies with respect to a grammatical model, in general, some external information source. Two extended examples are presented, illustrating these different techniques: (1) the variation n‐gram method, which searches for inconsistences in annotation for identical strings; and (2) a method of ad hoc rule detection, for syntactic annotation, which compares treebank rules to a grammar to determine which are anomalous. Methods for detecting annotation errors have developed much over the last decade, and thus corpus practitioners can benefit greatly from them, while at the same time NLP researchers can learn more about the nuances of the annotation they use and see how error correction methods intersect with NLP techniques.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.