Multilevel Annotation for Information Extraction

Jin-Dong Kim,Jun’Ichi Tsujii,Tomoko Ohta

doi:10.1007/978-90-481-3331-4_7

Abstract

Information Extraction (IE) is the broad task of detecting and extracting specific structured information from unstructured natural language text. IE typically requires analysis to determine the linguistic structure of text and semantic processing to map linguistic structures to semantic ones. For real-world applications, this processing often needs to be performed at various levels, determining e.g. the parts-of-speech, syntactic structure, named entities, and events. Multilevel annotations made to a corpus are a necessary resource for the development of multilevel text processing tools and eventually automatic IE systems, providing both reference and training material for method development and benchmark data sets. This chapter introduces the GENIA corpus and various annotations made to it as an example of multilevel annotation made for IE, and discusses general issues in multilevel annotation.

Full Text