Abstract

Relation extraction has benefited from distant supervision in recent years with the development of natural language processing techniques and data explosion. However, distant supervision is still greatly limited by the quality of training data, due to its natural motivation for greatly reducing the heavy cost of data annotation. In this paper, we construct an architecture called MIML-sort (Multi-instance Multi-label Learning with Sorting Strategies), which is built on the famous MIML framework. Based on MIML-sort, we propose three ranking-based methods for sample selection with which we identify relation extractors from a subset of the training data. Experiments are set up on the KBP (Knowledge Base Propagation) corpus, one of the benchmark datasets for distant supervision, which is large and noisy. Compared with previous work, the proposed methods produce considerably better results. Furthermore, the three methods together achieve the best F1 on the official testing set, with an optimal enhancement of F1 from 27.3% to 29.98%.

Highlights

  • Relation extraction aims at predicting the semantic relation between a pair of name entities, such as the govern_of relation between a PERSON and an ORGIZATION, the date_of_birth relation between a DATE and a PERSON, etc. [1,2,3,4,5]

  • Compared with multi-instance multi-label learning (MIML)-semi: (1) they only considered the missing information in the knowledge bases (KB) but not in the text; (2) they explicitly modeled the incorrectness of group-level labels through hard-coding the percentage of positive groups for each relation while we implicitly model them by means of ranking scores; (3) they changed the group-level labels before each Expectation Maximization (EM) training step while we remove those groups whose labels are likely to be wrong; (4) they tuned the parameters only to optimize the Precision-Recall curve (P-R curve) but we optimize both the P-R curve and the F1 value on the official testing set (Final F1 )

  • We further discover that MIML-sort(s) did not remove any group containing over 10 instances

Read more

Summary

Introduction

Relation extraction aims at predicting the semantic relation between a pair of name entities, such as the govern_of relation between a PERSON and an ORGIZATION, the date_of_birth relation between a DATE and a PERSON, etc. [1,2,3,4,5]. With the explosive growth of information on the web, it is no longer reasonable for human experts to do extensive data annotation since new information increases exponentially and data annotation would require a great deal of time and effort. In this situation, distant supervision (DS) provides a way to alleviate this problem by aligning existing knowledge bases (KB) with batches of free text. DS assumes that a sentence conveys the relation r if it contains the corresponding entity e1 and e2 [10,11,12]. DS still faces two main challenges today: (1) we do not know in advance the exact mappings between sentences and relations; and (2) we cannot guarantee both the KB and the free text are complete

Methods
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.