The biomedical relationship corpus of the BioRED track at the BioCreative VIII challenge and workshop.

Rezarta Islamaj,Chih-Hsuan Wei,Po-Ting Lai,Ling Luo,Cathleen Coss,Preeti Gokal Kochar,Nicholas Miliaras,Oleg Rodionov,Keiko Sekiya,Dorothy Trinh,Deborah Whitman,Zhiyong Lu

doi:10.1093/database/baae071

Abstract

The automatic recognition of biomedical relationships is an important step in the semantic understanding of the information contained in the unstructured text of the published literature. The BioRED track at BioCreative VIII aimed to foster the development of such methods by providing the participants the BioRED-BC8 corpus, a collection of 1000 PubMed documents manually curated for diseases, gene/proteins, chemicals, cell lines, gene variants, and species, as well as pairwise relationships between them which are disease-gene, chemical-gene, disease-variant, gene-gene, chemical-disease, chemical-chemical, chemical-variant, and variant-variant. Furthermore, relationships are categorized into the following semantic categories: positive correlation, negative correlation, binding, conversion, drug interaction, comparison, cotreatment, and association. Unlike most of the previous publicly available corpora, all relationships are expressed at the document level as opposed to the sentence level, and as such, the entities are normalized to the corresponding concept identifiers of the standardized vocabularies, namely, diseases and chemicals are normalized to MeSH, genes (and proteins) to National Center for Biotechnology Information (NCBI) Gene, species to NCBI Taxonomy, cell lines to Cellosaurus, and gene/protein variants to Single Nucleotide Polymorphism Database. Finally, each annotated relationship is categorized as 'novel' depending on whether it is a novel finding or experimental verification in the publication it is expressed in. This distinction helps differentiate novel findings from other relationships in the same text that provides known facts and/or background knowledge. The BioRED-BC8 corpus uses the previous BioRED corpus of 600 PubMed articles as the training dataset and includes a set of newly published 400 articles to serve as the test data for the challenge. All test articles were manually annotated for the BioCreative VIII challenge by expert biocurators at the National Library of Medicine, using the original annotation guidelines, where each article is doubly annotated in a three-round annotation process until full agreement is reached between all curators. This manuscript details the characteristics of the BioRED-BC8 corpus as a critical resource for biomedical named entity recognition and relation extraction. Using this new resource, we have demonstrated advancements in biomedical text-mining algorithm development. Database URL: https://codalab.lisn.upsaclay.fr/competitions/16381.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The biomedical relationship corpus of the BioRED track at the BioCreative VIII challenge and workshop.

Abstract

Talk to us

Similar Papers

More From: Database : the journal of biological databases and curation

Lead the way for us

Journal: Database : the journal of biological databases and curation	Publication Date: Aug 9, 2024
Citations: 1

Similar Papers

Clinicians' Guide to New Tools and Features of PubMed
Denise M Dupras ... Jon O Ebbert
Mayo Clinic Proceedings | VOL. 82
Denise M Dupras, et. al.Denise M Dupras ... Jon O Ebbert
01 Apr 2007
Mayo Clinic Proceedings | VOL. 82

Promotion to MEDLINE, indexing with Medical Subject Headings, and open data policy for the Journal of Educational Evaluation for Health Professions.
Sun Huh
Journal of educational evaluation for health professions | VOL. 13
Sun HuhSun Huh
30 Mar 2016
Journal of educational evaluation for health professions | VOL. 13

Nearsighted? farsighted? pragmatic? idealistic? “Charting a Course for the 21st Century”: the National Library of Medicine's long-range plan, 2006–2016
Gail Yokote
Journal of the Medical Library Association : JMLA | VOL. 96
Gail YokoteGail Yokote
01 Oct 2008
Journal of the Medical Library Association : JMLA | VOL. 96

A Web-Based Systems Immunology Toolkit Allows the Visualization and Analysis of Public Collective Data to Decipher Immunity in Early Life
Nico Marr ... Sabri Boughorbel
-
Nico Marr, et. al.Nico Marr ... Sabri Boughorbel
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The biomedical relationship corpus of the BioRED track at the BioCreative VIII challenge and workshop.

Abstract

Talk to us

Similar Papers

More From: Database : the journal of biological databases and curation