Abstract

Open information extraction approaches are useful but insufficient alone for populating the Web with machine read- able information as their results are not directly linkable to, and immediately reusable from, other Linked Data sources. This work proposes a novel paradigm, named Open Knowledge Extraction, and its implementation (Legalo) that performs unsuper- vised, open domain, and abstractive knowledge extraction from text for producing machine readable information. The imple- mented method is based on the hypothesis that hyperlinks (either created by humans or knowledge extraction tools) provide a pragmatic trace of semantic relations between two entities, and that such semantic relations, their subjects and objects, can be revealed by processing their linguistic traces (i.e. the sentences that embed the hyperlinks) and formalised as Semantic Web triples and ontology axioms. Experimental evaluations conducted on validated text extracted from Wikipedia pages, with the help of crowdsourcing, confirm this hypothesis showing high performances. A demo is available at http://wit.istc.cnr.it/stlab-tools/ legalo.

Highlights

  • Open information extraction approaches are useful but insufficient alone for populating the Web with machine readable information as their results are not directly linkable to, and immediately reusable from, other Linked Data sources

  • A link to “Usenet” in the Wikipedia page of “John McCarthy” suggests a semantic relation between those two entities, which is explained by the sentence: “McCarthy often commented on world affairs on the Usenet forums”6

  • It is worth to remark that the evaluation experiment of LegaloWikipedia was performed by Linked Data experts, comparing the new results with the previous ones provides insights on the usability of the generated predicates, regardless the expertise of the evaluators

Read more

Summary

Introduction

Open information extraction approaches are useful but insufficient alone for populating the Web with machine readable information as their results are not directly linkable to, and immediately reusable from, other Linked Data sources. The implemented method is based on the hypothesis that hyperlinks (either created by humans or knowledge extraction tools) provide a pragmatic trace of semantic relations between two entities, and that such semantic relations, their subjects and objects, can be revealed by processing their linguistic traces (i.e. the sentences that embed the hyperlinks) and formalised as Semantic Web triples and ontology axioms. The Linked Data movement [2] realised the first substantiation of this vision by bootstrapping the publication of machine understandable information, Current KE systems address the task of linking pieces of text to Semantic Web entities very well (e.g. Some of them (e.g. NERD) perform sense tagkipedia:Lisp_(programming_language) (respecging, i.e. adding knowledge about entity types (rdf:type). A link to “Usenet” in the Wikipedia page of “John McCarthy” suggests a semantic relation between those two entities, which is explained by the sentence: “McCarthy often commented on world affairs on the Usenet forums”

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call