Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.

Rezarta Islamaj Doğan,Sun Kim,Jinchan Qu,Qingyu Chen,Yanshan Wang,Tung Tran,Karin Verspoor,Rui Antunes,Aparna Elangovan,Zhuang Liu,Jinfeng Zhang,Sérgio Matos,Chen-Kai Wang,Berna Altınel,Chih-Hsuan Wei,Aris Fergadis,Zhiyong Lu,Ling Luo,Zehra Melce Hüsünbeyi,Albert Steppi,Hongfang Liu,Hong-Jie Dai,Arzucan Özgür,Nagesh C Panyam ,Andrew Chatr‐Aryamontri ,Donald C Comeau

doi:10.1093/database/bay147

Abstract

The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate genome annotation, disease association studies, electronic health records and other data types. The biomedical literature provides a rich foundation for populating these KBs, reporting genetic and molecular interactions that provide the scaffold for the cellular regulatory systems and detailing the influence of genetic variants in these interactions. The goal of BioCreative VI Precision Medicine Track was to extract this particular type of information and was organized in two tasks: (i) document triage task, focused on identifying scientific literature containing experimentally verified protein–protein interactions (PPIs) affected by genetic mutations and (ii) relation extraction task, focused on extracting the affected interactions (protein pairs). To assist system developers and task participants, a large-scale corpus of PubMed documents was manually annotated for this task. Ten teams worldwide contributed 22 distinct text-mining models for the document triage task, and six teams worldwide contributed 14 different text-mining systems for the relation extraction task. When comparing the text-mining system predictions with human annotations, for the triage task, the best F-score was 69.06%, the best precision was 62.89%, the best recall was 98.0% and the best average precision was 72.5%. For the relation extraction task, when taking homologous genes into account, the best F-score was 37.73%, the best precision was 46.5% and the best recall was 54.1%. Submitted systems explored a wide range of methods, from traditional rule-based, statistical and machine learning systems to state-of-the-art deep learning methods. Given the level of participation and the individual team results we find the precision medicine track to be successful in engaging the text-mining research community. In the meantime, the track produced a manually annotated corpus of 5509 PubMed documents developed by BioGRID curators and relevant for precision medicine. The data set is freely available to the community, and the specific interactions have been integrated into the BioGRID data set. In addition, this challenge provided the first results of automatically identifying PubMed articles that describe PPI affected by mutations, as well as extracting the affected relations from those articles. Still, much progress is needed for computer-assisted precision medicine text mining to become mainstream. Future work should focus on addressing the remaining technical challenges and incorporating the practical benefits of text-mining tools into real-world precision medicine information-related curation.

Highlights

Introduction and motivationBioCreative challenges [1,2,3,4,5,6,7,8], historically, have aimed to bring forth community tasks that result in the development of text-mining systems that can be of practical use to database curators and the users of textual data in the field of biology
As previously described in [19] each of these PubMed documents was first manually labeled for relevance for the triage task, and for the relation extraction task, the subset of PubMed documents that had been previously curated by IntAct/Mint for protein– protein interactions (PPIs) relations was annotated with those interacting protein pairs if the interactions were affected by mutations, and the interaction was named in the abstract
In post-challenge analysis, we found that relying on standard bioNLP tools to identify entities relevant to PPI affected by mutation relations is inadequate and that the manually defined term lists are effective to produce stronger recall than entity-based methods alone, this effect was dampened due to variations in the distribution of mutations in the test set as compared to the training set

Summary

Introduction

Introduction and motivationBioCreative challenges [1,2,3,4,5,6,7,8], historically, have aimed to bring forth community tasks that result in the development of text-mining systems that can be of practical use to database curators and the users of textual data in the field of biology. Keeping with the current needs, community challenges in biomedical natural language processing such as BioNLP and BioASQ [15,16,17,18] have addressed development of information extraction systems for relevant and emerging research areas All these tasks have provided and produced quality data sets for training and testing of automated systems that contained abstracts of biomedical scientific publications, as well as full text [19,20,21,22,23,24,25]. To efficiently translate this new approach into clinical practice it is required to foster the de novo development and access to knowledge bases (KBs) storing and organizing the potential effect of genetic variations on molecular phenotypes

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Database	Publication Date: Jan 1, 2019
Citations: 37	License type: cc-by

R Discovery Prime

R Discovery Prime

Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Database

Lead the way for us

Similar Papers

Is precision medicine the future of healthcare?
Richard W Barker
Personalized Medicine | VOL. 14
Richard W BarkerRichard W Barker
01 Nov 2017
Personalized Medicine | VOL. 14

Precision medicine initiative boosts funding for NCI efforts: Proposal would help broaden availability of targeted therapies.
Carrie Printz
Cancer | VOL. 121
Carrie PrintzCarrie Printz
18 Sep 2015
Cancer | VOL. 121

Abstract A73: Knowledge of precision medicine among African-Americans: A pilot study
Jennifer Nguyen ... Folakemi Odedina
Cancer Epidemiology Biomarkers & Prevention | VOL. 26
Jennifer Nguyen, et. al.Jennifer Nguyen ... Folakemi Odedina
01 Feb 2017
Cancer Epidemiology Biomarkers & Prevention | VOL. 26

Precision Medicine for Personalized Cancer Therapy.
Ada Hang-Heng Wong ... Chu-Xia Deng
International journal of biological sciences | VOL. 11
Ada Hang-Heng Wong, et. al.Ada Hang-Heng Wong ... Chu-Xia Deng
01 Jan 2015
International journal of biological sciences | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Database