Drug knowledge discovery via multi-task learning and pre-trained models

Dongfang Li,Ying Xiong,Buzhou Tang,Weihua Peng,Qingcai Chen,Baotian Hu

doi:10.1186/s12911-021-01614-7

Abstract

BackgroundDrug repurposing is to find new indications of approved drugs, which is essential for investigating new uses for approved or investigational drug efficiency. The active gene annotation corpus (named AGAC) is annotated by human experts, which was developed to support knowledge discovery for drug repurposing. The AGAC track of the BioNLP Open Shared Tasks using this corpus is organized by EMNLP-BioNLP 2019, where the “Selective annotation” attribution makes AGAC track more challenging than other traditional sequence labeling tasks. In this work, we show our methods for trigger word detection (Task 1) and its thematic role identification (Task 2) in the AGAC track. As a step forward to drug repurposing research, our work can also be applied to large-scale automatic extraction of medical text knowledge.MethodsTo meet the challenges of the two tasks, we consider Task 1 as the medical name entity recognition (NER), which cultivates molecular phenomena related to gene mutation. And we regard Task 2 as a relation extraction task, which captures the thematic roles between entities. In this work, we exploit pre-trained biomedical language representation models (e.g., BioBERT) in the information extraction pipeline for mutation-disease knowledge collection from PubMed. Moreover, we design the fine-tuning framework by using a multi-task learning technique and extra features. We further investigate different approaches to consolidate and transfer the knowledge from varying sources and illustrate the performance of our model on the AGAC corpus. Our approach is based on fine-tuned BERT, BioBERT, NCBI BERT, and ClinicalBERT using multi-task learning. Further experiments show the effectiveness of knowledge transformation and the ensemble integration of models of two tasks. We conduct a performance comparison of various algorithms. We also do an ablation study on the development set of Task 1 to examine the effectiveness of each component of our method.ResultsCompared with competitor methods, our model obtained the highest Precision (0.63), Recall (0.56), and F-score value (0.60) in Task 1, which ranks first place. It outperformed the baseline method provided by the organizers by 0.10 in F-score. The model shared the same encoding layers for the named entity recognition and relation extraction parts. And we obtained a second high F-score (0.25) in Task 2 with a simple but effective framework.ConclusionsExperimental results on the benchmark annotation of genes with active mutation-centric function changes corpus show that integrating pre-trained biomedical language representation models (i.e., BERT, NCBI BERT, ClinicalBERT, BioBERT) into a pipe of information extraction methods with multi-task learning can improve the ability to collect mutation-disease knowledge from PubMed.

Highlights

Drug repurposing is to find new indications of approved drugs, which is essential for investigating new uses for approved or investigational drug efficiency
When fine-tuning the Bidirectional encoder representations from transformers (BERT), we found that the performance of the model performed better in the case of BIO for the selection of the tagging schemes compared to BIOES
We show a comparison of the performance of the development set results using different pre-trained models

Summary

Introduction

Drug repurposing is to find new indications of approved drugs, which is essential for investigating new uses for approved or investigational drug efficiency. As a step forward to drug repurposing research, our work can be applied to large-scale automatic extraction of medical text knowledge. Drug repurposing is a strategy used to identify new uses for approved or investigational drugs that are beyond the scope of the original medical indication. It focuses on predicting the effective off-label usages of existing drugs on the market. PubMed is considered a significant source of knowledge discovery because it stores a growing number of scientific discovery reports. It requires further development of more automated methods. Utilizing the natural language processing techniques to find and mine medication-related information from the text (e.g., PubMed) for drug repurposing has been a promising exploration theme [1,2,3,4]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Informatics and Decision Making	Publication Date: Nov 1, 2021
Citations: 1	License type: open-access

R Discovery Prime

R Discovery Prime

Drug knowledge discovery via multi-task learning and pre-trained models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making

Lead the way for us

Similar Papers

Trigger Word Detection and Thematic Role Identification via BERT and Multitask Learning
Dongfang Li ... Hanyang Du
-
Dongfang Li, et. al.Dongfang Li ... Hanyang Du
01 Jan 2019
01 Jan 2019

KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation
...
-
, et. al. ...
01 Aug 2021
01 Aug 2021

Depression Risk Prediction for Chinese Microblogs via Deep-Learning Methods: Content Analysis
Xiaofeng Wang ... Jun Yan
JMIR Medical Informatics | VOL. 8
Xiaofeng Wang, et. al.Xiaofeng Wang ... Jun Yan
29 Jul 2020
JMIR Medical Informatics | VOL. 8

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.
Jinhyuk Lee ... Sungdong Kim
Bioinformatics | VOL. 36
Jinhyuk Lee, et. al.Jinhyuk Lee ... Sungdong Kim
10 Sep 2019
Bioinformatics | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Drug knowledge discovery via multi-task learning and pre-trained models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making