Abstract

Distant supervision is an effective method to automatically collect large-scale datasets for relation extraction (RE). Automatically constructed datasets usually comprise two types of noise: the intrasentence noise and the wrongly labeled noisy sentence. To address issues caused by the above two types of noise and improve distantly supervised relation extraction, this paper proposes a novel distantly supervised relation extraction model, which consists of an entity-based gated convolution sentence encoder and a multilevel sentence selective attention (Matt) module. Specifically, we first apply an entity-based gated convolution operation to force the sentence encoder to extract entity-pair-related features and filter out useless intrasentence noise information. Furthermore, the multilevel attention schema fuses the bag information to obtain a fine-grained bag-specific query vector, which can better identify valid sentences and reduce the influence of wrongly labeled sentences. Experimental results on a large-scale benchmark dataset show that our model can effectively reduce the influence of the above two types of noise and achieves state-of-the-art performance in relation extraction.

Highlights

  • In distant supervision, a fact triple (h, t, r) of a given knowledge graph (KG) contains the two entities h and t, where h, t, and r denote head entity, tail entity, and relation, respectively

  • Given the sentence “ e problem might have been that the family was in NBC’s suite, but Dick Ebersol, the chairman of NBC Universal Sports, said by telephone that. . .,” we find that the part which expresses the relationship contained in the triple (“Dick Ebersol,” “NBC Universal Sports,” and “/business/ person/company”) is the subsentence “but Dick Ebersol, the chairman of NBC Universal Sports.” e other parts of the sentence are meaningless for the relation extraction and may even hinder the performance of the model

  • We propose a novel model for relation extraction to tackle the two types of noise problems introduced by distant supervision. e model is composed of two main modules

Read more

Summary

Introduction

A fact triple (h, t, r) of a given knowledge graph (KG) contains the two entities h and t, where h, t, and r denote head entity, tail entity, and relation, respectively. (i) To get rid of the influence of the intrasentence noise, we propose an entity-based gated convolution to filter out the useless information and extract entitypair-related relational features from a sentence (ii) To address the problem of incorrect labeling, we design a Matt module that generates a bag-specific query vector to assign lower attention scores to those noise sentences (iii) Experimental results on a large-scale benchmark dataset show that our model can effectively reduce the influence of the above two types of noise and achieve state-of-the-art performance in relation extraction

Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call