Abstract

BackgroundThe Genetic and Rare Diseases (GARD) Information Center was established by the National Institutes of Health (NIH) to provide freely accessible consumer health information on over 6500 genetic and rare diseases. As the cumulative scientific understanding and underlying evidence for these diseases have expanded over time, existing practices to generate knowledge from these publications and resources have not been able to keep pace. Through determining the applicability of computational approaches to enhance or replace manual curation tasks, we aim to both improve the sustainability and relevance of consumer health information, but also to develop a foundational database, from which translational science researchers may start to unravel disease characteristics that are vital to the research process.ResultsWe developed a meta-ontology based integrative knowledge graph for rare diseases in Neo4j. This integrative knowledge graph includes a total of 3,819,623 nodes and 84,223,681 relations from 34 different biomedical data resources, including curated drug and rare disease associations. Semi-automatic mappings were generated for 2154 unique FDA orphan designations to 776 unique GARD diseases, and 3322 unique FDA designated drugs to UNII, as well as 180,363 associations between drug and indication from Inxight Drugs, which were integrated into the knowledge graph. We conducted four case studies to demonstrate the capabilities of this integrative knowledge graph in accelerating the curation of scientific understanding on rare diseases through the generation of disease mappings/profiles and pathogenesis associations.ConclusionsBy integrating well-established database resources, we developed an integrative knowledge graph containing a large volume of biomedical and research data. Demonstration of several immediate use cases and limitations of this process reveal both the potential feasibility and barriers of utilizing graph-based resources and approaches to support their use by providers of consumer health information, such as GARD, that may struggle with the needs of maintaining knowledge reliant on an evolving and growing evidence-base. Finally, the successful integration of these datasets into a freely accessible knowledge graph highlights an opportunity to take a translational science view on the field of rare diseases by enabling researchers to identify disease characteristics, which may play a role in the translation of discover across different research domains.

Highlights

  • An estimated 30 million people in the United States are affected by a rare disease, which is defined as a disease that affects fewer than 200,000 individuals in the United States [1]

  • The Genetic and Rare Diseases (GARD) information center was charged with providing freely accessible consumer health information in plain language, and it has been investigating the challenge of shifting from an entirely manual process to leveraging computational approaches to curate the accumulated biomedical and clinical research knowledge of over 6500 rare diseases, and more rapidly make information accessible 1) to educate patients, families, and health care providers with more accurate and real-time knowledge about a rare disease, and 2) to support novel scientific research efforts and apply disease-agnostic translational science approaches to the field of rare diseases as a whole [5]

  • The integrative knowledge graph we introduce in this study applies well-established rare disease data drawn from GARD, Orphanet, Online Mendelian Inheritance in Man (OMIM) and Monarch Disease Ontology (MONDO) as a backbone, and expands to a wide spectrum of additional biomedical data, including phenotypes, genes and curated FDA approved drugs and FDA orphan drug designations

Read more

Summary

Introduction

An estimated 30 million people in the United States are affected by a rare disease, which is defined as a disease that affects fewer than 200,000 individuals in the United States [1]. Despite the great heterogeneity of diseases included in this definition, many patients and their families share in common struggles, such as with diagnostic delay leading to “an average of 7.6 years” from initial onset of symptoms to receiving a diagnosis and requiring the involvement of 7.3 physicians on average [4]. These shared challenges faced in the broader rare disease patient community are often due to a lack of either up-to-date information or awareness amongst providers and the public at large. Through determining the applicability of computational approaches to enhance or replace manual curation tasks, we aim to both improve the sustainability and relevance of consumer health information, and to develop a foundational database, from which translational science researchers may start to unravel disease characteristics that are vital to the research process

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call