Large-Scale Information Extraction from Textual Definitions through Deep Syntactic and Semantic Analysis

Claudio Delli Bovi,Luca Telesca,Roberto Navigli

doi:10.1162/tacl_a_00156

Abstract

We present DefIE, an approach to large-scale Information Extraction (IE) based on a syntactic-semantic analysis of textual definitions. Given a large corpus of definitions we leverage syntactic dependencies to reduce data sparsity, then disambiguate the arguments and content words of the relation strings, and finally exploit the resulting information to organize the acquired relations hierarchically. The output of DefIE is a high-quality knowledge base consisting of several million automatically acquired semantic relations.

Highlights

The problem of knowledge acquisition lies at the core of Natural Language Processing
A more radical approach is adopted in systems like TEXTRUNNER (Etzioni et al, 2008) and REVERB (Fader et al, 2011), which developed from the Open Information Extraction (OIE) paradigm (Etzioni et al, 2008) and focused on the unconstrained extraction of a large number of relations from massive unstructured corpora
We presented DEFIE, an approach to OIE that, thanks to a novel unified syntactic-semantic analysis of text, harvests instances of semantic relations from a corpus of textual definitions

Summary

Introduction

The problem of knowledge acquisition lies at the core of Natural Language Processing. A more radical approach is adopted in systems like TEXTRUNNER (Etzioni et al, 2008) and REVERB (Fader et al, 2011), which developed from the Open Information Extraction (OIE) paradigm (Etzioni et al, 2008) and focused on the unconstrained extraction of a large number of relations from massive unstructured corpora. All these endeavors were geared towards addressing the knowledge acquisition problem and tackling long-standing challenges in the field, such as Machine Reading (Mitchell, 2005)

Methods

Results

Conclusion