BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations.

Kyubum Lee,Jaewoo Kang,Aik Choon Tan,Kwanghun Choi,Sungjoon Park,Suhkyung Kim,Sunkyu Kim,Sunwon Lee

doi:10.1093/database/baw043

Kyubum Lee, Jaewoo Kang + Show 6 more

Open Access

https://doi.org/10.1093/database/baw043

Copy DOI

Abstract

Comprehensive knowledge of genomic variants in a biological context is key for precision medicine. As next-generation sequencing technologies improve, the amount of literature containing genomic variant data, such as new functions or related phenotypes, rapidly increases. Because numerous articles are published every day, it is almost impossible to manually curate all the variant information from the literature. Many researchers focus on creating an improved automated biomedical natural language processing (BioNLP) method that extracts useful variants and their functional information from the literature. However, there is no gold-standard data set that contains texts annotated with variants and their related functions. To overcome these limitations, we introduce a Biomedical entity Relation ONcology COrpus (BRONCO) that contains more than 400 variants and their relations with genes, diseases, drugs and cell lines in the context of cancer and anti-tumor drug screening research. The variants and their relations were manually extracted from 108 full-text articles. BRONCO can be utilized to evaluate and train new methods used for extracting biomedical entity relations from full-text publications, and thus be a valuable resource to the biomedical text mining research community. Using BRONCO, we quantitatively and qualitatively evaluated the performance of three state-of-the-art BioNLP methods. We also identified their shortcomings, and suggested remedies for each method. We implemented post-processing modules for the three BioNLP methods, which improved their performance.Database URL: http://infos.korea.ac.kr/bronco

Highlights

Modern next-generation sequencing (NGS) technologies have revolutionized modern biomedical research
We developed BRONCO—a Biomedical entity Relation ONcology COrpus—which is a variant-centric data set with related genes, diseases, drugs and cell lines
We attached the guidelines for the manual curation and the curation example file that we provided to the curators as supplementary files

Summary

Introduction

Modern next-generation sequencing (NGS) technologies have revolutionized modern biomedical research. Cancer genomics studies that use NGS have identified novel somatic alterations such as single-nucleotide variants, insertions and deletions, copy number aberrations, structural variants and gene fusions as actionable targets in cancer. Variant annotation is a key step in the analysis of cancer genomics data. Many thousands of cancer genomes and exomes have been sequenced; the efforts in variant annotation have not been able to keep up with the identified variants. The functional annotation of variants can profoundly impact the conclusions of disease studies. Incorrect or incomplete annotations could cause researchers to overlook disease-relevant variants or label interesting variants as false positives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Database	Publication Date: Jan 1, 2016
Citations: 37	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Database

Lead the way for us

Similar Papers

A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools
Karin Verspoor ... Christophe Roeder
BMC Bioinformatics | VOL. 13
Karin Verspoor, et. al.Karin Verspoor ... Christophe Roeder
17 Aug 2012
BMC Bioinformatics | VOL. 13

Improving the robustness and accuracy of biomedical language models through adversarial training
Milad Moradi ... Matthias Samwald
Journal of Biomedical Informatics | VOL. 132
Milad Moradi, et. al.Milad Moradi ... Matthias Samwald
15 Jun 2022
Journal of Biomedical Informatics | VOL. 132

Exploring the Latest Highlights in Medical Natural Language Processing across Multiple Languages: A Survey.
Anastassia Shaitarova ... Alberto Lavelli
Yearbook of Medical Informatics | VOL. 32
Anastassia Shaitarova, et. al.Anastassia Shaitarova ... Alberto Lavelli
01 Aug 2023
Yearbook of Medical Informatics | VOL. 32

Benchmarking for biomedical natural language processing tasks with a domain specific ALBERT
Usman Naseem ... Matloob Khushi
BMC Bioinformatics | VOL. 23
Usman Naseem, et. al.Usman Naseem ... Matloob Khushi
21 Apr 2022
BMC Bioinformatics | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Database