Abstract

There is a widely recognized need for a general framework for linguistic annotation that is flexible and extensible enough to accommodate different annotation types and different theoretical and practical approaches, while at the same time enabling their representation in a “pivot” format that can serve as the basis for comparative evaluation, merging, and the development of reusable editing and processing tools. To address this need, we have developed a framework composed of an abstract model for a variety of different annotation types (e.g., morpho-syntactic tagging, syntactic annotation, coreference annotation, etc.), which can be instantiated in different ways depending on the annotator’s approach and goals. The results have been incorporated into XCES (Ide et al., 2000), the XML instantiation of the Corpus Encoding Standard (Ide 1998a,b), which provides a ready-made, standard encoding format together with a data architecture designed specifically for linguistically annotated corpora.KeywordsCorpus Annotation StandardsExtended Markup LanguageResource Description Framework

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call