Abstract
Open information extraction approaches are useful but insufficient alone for populating the Web with machine read- able information as their results are not directly linkable to, and immediately reusable from, other Linked Data sources. This work proposes a novel paradigm, named Open Knowledge Extraction, and its implementation (Legalo) that performs unsuper- vised, open domain, and abstractive knowledge extraction from text for producing machine readable information. The imple- mented method is based on the hypothesis that hyperlinks (either created by humans or knowledge extraction tools) provide a pragmatic trace of semantic relations between two entities, and that such semantic relations, their subjects and objects, can be revealed by processing their linguistic traces (i.e. the sentences that embed the hyperlinks) and formalised as Semantic Web triples and ontology axioms. Experimental evaluations conducted on validated text extracted from Wikipedia pages, with the help of crowdsourcing, confirm this hypothesis showing high performances. A demo is available at http://wit.istc.cnr.it/stlab-tools/ legalo.
Highlights
Open information extraction approaches are useful but insufficient alone for populating the Web with machine readable information as their results are not directly linkable to, and immediately reusable from, other Linked Data sources
A link to “Usenet” in the Wikipedia page of “John McCarthy” suggests a semantic relation between those two entities, which is explained by the sentence: “McCarthy often commented on world affairs on the Usenet forums”6
It is worth to remark that the evaluation experiment of LegaloWikipedia was performed by Linked Data experts, comparing the new results with the previous ones provides insights on the usability of the generated predicates, regardless the expertise of the evaluators
Summary
Open information extraction approaches are useful but insufficient alone for populating the Web with machine readable information as their results are not directly linkable to, and immediately reusable from, other Linked Data sources. The implemented method is based on the hypothesis that hyperlinks (either created by humans or knowledge extraction tools) provide a pragmatic trace of semantic relations between two entities, and that such semantic relations, their subjects and objects, can be revealed by processing their linguistic traces (i.e. the sentences that embed the hyperlinks) and formalised as Semantic Web triples and ontology axioms. The Linked Data movement [2] realised the first substantiation of this vision by bootstrapping the publication of machine understandable information, Current KE systems address the task of linking pieces of text to Semantic Web entities very well (e.g. Some of them (e.g. NERD) perform sense tagkipedia:Lisp_(programming_language) (respecging, i.e. adding knowledge about entity types (rdf:type). A link to “Usenet” in the Wikipedia page of “John McCarthy” suggests a semantic relation between those two entities, which is explained by the sentence: “McCarthy often commented on world affairs on the Usenet forums”
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.