Relation Extraction Datasets Research Articles

A biomedical relation statement is commonly expressed in multiple sentences and consists of many concepts, including gene, disease, chemical and mutation. To automatically extract information from biomedical literature, existing biomedical text-mining approaches typically formulate the problem as a cross-sentence n-ary relation-extraction task that detects relations among n entities across multiple sentences, and use either a graph neural network (GNN) with long short-term memory (LSTM) or an attention mechanism. Recently, Transformer has been shown to outperform LSTM on many natural language processing (NLP) tasks. In this work, we propose a novel architecture that combines Bidirectional Encoder Representations from Transformers with Graph Transformer (BERT-GT), through integrating a neighbor-attention mechanism into the BERT architecture. Unlike the original Transformer architecture, which utilizes the whole sentence(s) to calculate the attention of the current token, the neighbor-attention mechanism in our method calculates its attention utilizing only its neighbor tokens. Thus, each token can pay attention to its neighbor information with little noise. We show that this is critically important when the text is very long, as in cross-sentence or abstract-level relation-extraction tasks. Our benchmarking results show improvements of 5.44% and 3.89% in accuracy and F1-measure over the state-of-the-art on n-ary and chemical-protein relation datasets, suggesting BERT-GT is a robust approach that is applicable to other biomedical relation extraction tasks or datasets. the source code of BERT-GT will be made freely available athttps://github.com/ncbi/bert_gt upon publication. Supplementary data are available at Bioinformatics online.

Read full abstract

Biomedical relation extraction (RE) datasets are vital in the construction of knowledge bases and to potentiate the discovery of new interactions. There are several ways to create biomedical RE datasets, some more reliable than others, such as resorting to domain expert annotations. However, the emerging use of crowdsourcing platforms, such as Amazon Mechanical Turk (MTurk), can potentially reduce the cost of RE dataset construction, even if the same level of quality cannot be guaranteed. There is a lack of power of the researcher to control who, how and in what context workers engage in crowdsourcing platforms. Hence, allying distant supervision with crowdsourcing can be a more reliable alternative. The crowdsourcing workers would be asked only to rectify or discard already existing annotations, which would make the process less dependent on their ability to interpret complex biomedical sentences. In this work, we use a previously created distantly supervised human phenotype–gene relations (PGR) dataset to perform crowdsourcing validation. We divided the original dataset into two annotation tasks: Task 1, 70% of the dataset annotated by one worker, and Task 2, 30% of the dataset annotated by seven workers. Also, for Task 2, we added an extra rater on-site and a domain expert to further assess the crowdsourcing validation quality. Here, we describe a detailed pipeline for RE crowdsourcing validation, creating a new release of the PGR dataset with partial domain expert revision, and assess the quality of the MTurk platform. We applied the new dataset to two state-of-the-art deep learning systems (BiOnt and BioBERT) and compared its performance with the original PGR dataset, as well as combinations between the two, achieving a 0.3494 increase in average F-measure. The code supporting our work and the new release of the PGR dataset is available at https://github.com/lasigeBioTM/PGR-crowd.

Read full abstract

Relation Extraction Datasets Research Articles

Related Topics

Articles published on Relation Extraction Datasets

Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling

Entity Structure Within and Throughout: Modeling Mention Dependencies for Document-Level Relation Extraction

Re-TACRED: Addressing Shortcomings of the TACRED Dataset

DiS-ReX: A Multilingual Dataset for Distantly Supervised Relation Extraction

Representation iterative fusion based on heterogeneous graph neural network for joint entity and relation extraction

Research on Relation Extraction Method Based on Similar Relations and Bayesian Neural Network

BERT-GT: cross-sentence n-ary relation extraction with BERT and Graph Transformer.

Chinese Relation Extraction Using Extend Softword

Improving Distantly-Supervised Relation Extraction Through BERT-Based Label and Instance Embeddings

Utilizing Entity-Based Gated Convolution and Multilevel Sentence Attention to Improve Distantly Supervised Relation Extraction.

A Graph Convolutional Network With Multiple Dependency Representations for Relation Extraction

A hybrid approach toward biomedical relation extraction training corpora: combining distant supervision with crowdsourcing.

BioRel: towards large-scale biomedical relation extraction

The impact of learning Unified Medical Language System knowledge embeddings in relation extraction from biomedical texts

Improving accessibility and distinction between negative results in biomedical relation extraction.

Joint Entity and Relation Extraction with a Hybrid Transformer and Reinforcement Learning Based Model

An Attention-Based Model Using Character Composition of Entities in Chinese Relation Extraction

A Novel Document-Level Relation Extraction Method Based on BERT and Entity Information

Bootstrapping Knowledge Graphs From Images and Text.

An input information enhanced model for relation extraction

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Relation Extraction Datasets Research Articles

Related Topics

Articles published on Relation Extraction Datasets

Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling

Entity Structure Within and Throughout: Modeling Mention Dependencies for Document-Level Relation Extraction

Re-TACRED: Addressing Shortcomings of the TACRED Dataset

DiS-ReX: A Multilingual Dataset for Distantly Supervised Relation Extraction

Representation iterative fusion based on heterogeneous graph neural network for joint entity and relation extraction

Research on Relation Extraction Method Based on Similar Relations and Bayesian Neural Network

BERT-GT: cross-sentence n-ary relation extraction with BERT and Graph Transformer.

Chinese Relation Extraction Using Extend Softword

Improving Distantly-Supervised Relation Extraction Through BERT-Based Label and Instance Embeddings

Utilizing Entity-Based Gated Convolution and Multilevel Sentence Attention to Improve Distantly Supervised Relation Extraction.

A Graph Convolutional Network With Multiple Dependency Representations for Relation Extraction

A hybrid approach toward biomedical relation extraction training corpora: combining distant supervision with crowdsourcing.

BioRel: towards large-scale biomedical relation extraction

The impact of learning Unified Medical Language System knowledge embeddings in relation extraction from biomedical texts

Improving accessibility and distinction between negative results in biomedical relation extraction.

Joint Entity and Relation Extraction with a Hybrid Transformer and Reinforcement Learning Based Model

An Attention-Based Model Using Character Composition of Entities in Chinese Relation Extraction

A Novel Document-Level Relation Extraction Method Based on BERT and Entity Information

Bootstrapping Knowledge Graphs From Images and Text.

An input information enhanced model for relation extraction