Abstract

Joint extraction of entities and relations from unstructured text is an essential step in constructing a knowledge base. However, relational facts in these texts are often complicated, where most of them contain overlapping triplets, making the joint extraction task still challenging. This paper proposes a novel Sequence-to-Sequence (Seq2Seq) framework to handle the overlapping issue, which models the triplet extraction as a sequence generation task. Specifically, a unique cascade structure is proposed to connect transformer and pointer network to extract entities and relations jointly. By this means, sequences can be generated in triplet-level and it speeds up the decoding process. Besides, a syntax-guided encoder is applied to integrate the sentence’s syntax structure into the transformer encoder explicitly, which helps the encoder pay more accurate attention to the syntax-related words. Extensive experiments were conducted on three public datasets, named NYT24, NYT29, and WebNLG, and the results show the validity of this model by comparing with various baselines. In addition, a pre-trained BERT model is also employed as the encoder. Then it comes up to excellent performance that the F1 scores on the three datasets surpass the strongest baseline by 5.7%, 5.6%, and 4.4%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.