Abstract

AbstractThis paper surveys methods for annotation error detection and correction. Methods can broadly be characterized as to whether they detect inconsistencies with respect to some statistical model based only on the corpus data or whether they detect inconsistencies with respect to a grammatical model, in general, some external information source. Two extended examples are presented, illustrating these different techniques: (1) the variation n‐gram method, which searches for inconsistences in annotation for identical strings; and (2) a method of ad hoc rule detection, for syntactic annotation, which compares treebank rules to a grammar to determine which are anomalous. Methods for detecting annotation errors have developed much over the last decade, and thus corpus practitioners can benefit greatly from them, while at the same time NLP researchers can learn more about the nuances of the annotation they use and see how error correction methods intersect with NLP techniques.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call