Building Distant Supervised Relation Extractors

Thiago Nunes,Daniel Schwabe

doi:10.1109/icsc.2014.15

Abstract

A well-known drawback in building machine learning semantic relation detectors for natural language is the lack of a large number of qualified training instances for the target relations in multiple languages. Even when good results are achieved, the datasets used by the state-of-the-art approaches are rarely published. In order to address these problems, this work presents an automatic approach to build multilingual semantic relation detectors through distant supervision combining two of the largest resources of structured and unstructured content available on the Web, DBpedia and Wikipedia. We map the DBpedia ontology back to the Wikipedia text to extract more than 100.000 training instances for more than 90 DBpedia relations for English and Portuguese languages without human intervention. First, we mine the Wikipedia articles to find candidate instances for relations described in the DBpedia ontology. Second, we preprocess and normalize the data filtering out irrelevant instances. Finally, we use the normalized data to construct regularized logistic regression detectors that achieve more than 80% of F-Measure for both English and Portuguese languages. In this paper, we also compare the impact of different types of features on the accuracy of the trained detector, demonstrating significant performance improvements when combining lexical, syntactic and semantic features. Both the datasets and the code used in this research are available online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Building Distant Supervised Relation Extractors

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Multilingual Distant Supervised Relation Extractors Combining Multiple Feature Types
Thiago Nunes ... Daniel Schwabe
International Journal of Semantic Computing | VOL. 08
Thiago Nunes, et. al.Thiago Nunes ... Daniel Schwabe
01 Dec 2014
International Journal of Semantic Computing | VOL. 08

Exploiting Document Level Semantics in Document Clustering
Muhammad Rafi ... Muhammad Naveed
International Journal of Advanced Computer Science and Applications | VOL. 7
Muhammad Rafi, et. al.Muhammad Rafi ... Muhammad Naveed
01 Jan 2015
International Journal of Advanced Computer Science and Applications | VOL. 7

Demersal fish assemblages on seamounts and other rugged features in the northeastern Caribbean
Andrea M Quattrini ... Jason D Chaytor
Deep Sea Research Part I: Oceanographic Research Papers | VOL. 123
Andrea M Quattrini, et. al.Andrea M Quattrini ... Jason D Chaytor
18 Mar 2017
Deep Sea Research Part I: Oceanographic Research Papers | VOL. 123

A unified non-rigid feature registration method for brain mapping
Haili Chui ... Anand Rangarajan
Medical Image Analysis | VOL. 7
Haili Chui, et. al.Haili Chui ... Anand Rangarajan
11 Apr 2003
Medical Image Analysis | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Building Distant Supervised Relation Extractors

Abstract

Talk to us

Similar Papers