Large-scale extraction and use of knowledge from text

Peter Clark,Phil Harrison

doi:10.1145/1597735.1597763

Abstract

Many AI tasks, in particular natural language processing, require a large amount of world knowledge to create expectations, assess plausibility, and guide disambiguation. However, acquiring this world knowledge remains a formidable challenge. Building on ideas by Schubert, we have developed a system called DART (Discovery and Aggregation of Relations in Text) that extracts simple, semi-formal statements of world knowledge (e.g., airplanes can fly, people can drive cars) from text by abstracting from a parser's output, and we have used it to create a database of 23 million propositions of this kind. An evaluation of the DART database on two language processing tasks (parsing and textual entailment) shows that it improves performance, and a human evaluation shows that over half the facts in it are considered true or partially true, rising to 70% for facts seen with high frequency. The significance of this work is two-fold: First it has created a new, publically available knowledge resource for language processing and other data interpretation tasks, and second it provides empirical evidence of the utility of this type of knowledge, going beyond Schubert et al's earlier evaluations which were based solely on human inspection of its contents.

Full Text