Abstract

Open Information Extraction (Open IE) aims to obtain not predefined, domain-independent relations from text. This article introduces the Open IE research field, thoroughly discussing the main ideas and systems in the area as well as its main challenges and open issues. The paper describes an open extractor elaborated from the belief that it is not necessary to have an enormous list of patterns or several types of linguistic labels to better perform Open IE. The extractor is based on generic patterns that identify relations not previously specified, including rules corresponding to Cimiano and Wenderoth proposal to learn Qualia structure. Named LSOE (Lexical-Syntactic pattern-based Open Extractor) and designed to validate such strategy, this extractor is presented and its performance is compared with two Open IE systems. The results demonstrate that LSOE extracts relations that are not learned by other extractors and achieves compatible precision. The work reported here contributes with a new Open IE approach based on pattern matching, demonstrating the feasibility of an extractor based on simple lexical-syntactic patterns.

Highlights

  • Open Information Extraction (Open Information extraction (IE)) aims to obtain not predefined, domain-independent relations from text

  • The input of the first round was a corpus of 217 randomly selected sentences from Wikipedia articles related to the Philosophy of Language domain

  • We expect that LSOE obtains precision compatible with rule-based Open Information Extraction (Open IE) systems and that it extracts relations that are not learned by them

Read more

Summary

Introduction

Open Information Extraction (Open IE) aims to obtain not predefined, domain-independent relations from text. It is important to develop computational tools that extract and synthesize information from natural language text with the aim of building large-scale knowledge bases. The task of machine understanding of textual documents mainly parses and transforms unstructured text into a structured representation. This representation should be unambiguous - making it suitable for machine reading and machine interpretation [1]. Angeli and Manning [3], for example, propose to use the relations extracted from texts to enlarge databases of known facts and to predict facts, introducing the notion of fact similarity.

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call