Abstract

BackgroundMolecular biomarkers that can predict drug efficacy in cancer patients are crucial components for the advancement of precision medicine. However, identifying these molecular biomarkers remains a laborious and challenging task. Next-generation sequencing of patients and preclinical models have increasingly led to the identification of novel gene-mutation-drug relations, and these results have been reported and published in the scientific literature.ResultsHere, we present two new computational methods that utilize all the PubMed articles as domain specific background knowledge to assist in the extraction and curation of gene-mutation-drug relations from the literature. The first method uses the Biomedical Entity Search Tool (BEST) scoring results as some of the features to train the machine learning classifiers. The second method uses not only the BEST scoring results, but also word vectors in a deep convolutional neural network model that are constructed from and trained on numerous documents such as PubMed abstracts and Google News articles. Using the features obtained from both the BEST search engine scores and word vectors, we extract mutation-gene and mutation-drug relations from the literature using machine learning classifiers such as random forest and deep convolutional neural networks.Our methods achieved better results compared with the state-of-the-art methods. We used our proposed features in a simple machine learning model, and obtained F1-scores of 0.96 and 0.82 for mutation-gene and mutation-drug relation classification, respectively. We also developed a deep learning classification model using convolutional neural networks, BEST scores, and the word embeddings that are pre-trained on PubMed or Google News data. Using deep learning, the classification accuracy improved, and F1-scores of 0.96 and 0.86 were obtained for the mutation-gene and mutation-drug relations, respectively.ConclusionWe believe that our computational methods described in this research could be used as an important tool in identifying molecular biomarkers that predict drug responses in cancer patients. We also built a database of these mutation-gene-drug relations that were extracted from all the PubMed abstracts. We believe that our database can prove to be a valuable resource for precision medicine researchers.

Highlights

  • Molecular biomarkers that can predict drug efficacy in cancer patients are crucial components for the advancement of precision medicine

  • Identifying molecular biomarkers such as genes with specific mutations to predict the efficacy of a drug in cancer patients is important for the advancement of precision medicine

  • Since the baseline model is based on finding mutation related entities in a document-level dataset, we designed two different models: a machine learning model using features constructed at the document-level, and a deep convolutional neural network model using features constructed at the sentence-level

Read more

Summary

Introduction

Molecular biomarkers that can predict drug efficacy in cancer patients are crucial components for the advancement of precision medicine. Precision medicine aims to deliver personalized treatment to individual patients based on their genomic profiles Identifying molecular biomarkers such as genes with specific mutations to predict the efficacy of a drug in cancer patients is important for the advancement of precision medicine. Large-scale research projects such as Genomics of Drug Sensitivity in Cancer (GDSC) [3], Cancer Cell Line Encyclopedia (CCLE) [4] and Cancer Therapeutics Response Portal (CTRP) [5] provide gene-mutation-drug relations for the advancement of personalized medicine. Databases such as ClinVar [6], My Cancer Genome [7], MD Anderson Personalized Cancer Therapy Knowledgebase [8] contain gene-mutation-drug relations extracted from manually curated literature on clinical studies. Computational methods that automatically extract gene-mutation-drug relations from the literature are urgently needed to assist in the curation process

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.