Abstract

The repurposing of biomedical data is inhibited by its fragmented and multi-formatted nature that requires redundant investment of time and resources by data scientists. This is particularly true for Type 1 Diabetes (T1D), one of the most intensely studied common childhood diseases. Intense investigation of the contribution of pancreatic β-islet and T-lymphocytes in T1D has been made. However, genetic contributions from B-lymphocytes, which are known to play a role in a subset of T1D patients, remain relatively understudied. We have addressed this issue through the creation of Biomedical Data Commons (BMDC), a knowledge graph that integrates data from multiple sources into a single queryable format. This increases the speed of analysis by multiple orders of magnitude. We develop a pipeline using B-lymphocyte multi-dimensional epigenome and connectome data and deploy BMDC to assess genetic variants in the context of Type 1 Diabetes (T1D). Pipeline-identified variants are primarily common, non-coding, poorly conserved, and are of unknown clinical significance. While variants and their chromatin connectivity are cell-type specific, they are associated with well-studied disease genes in T-lymphocytes. Candidates include established variants in the HLA-DQB1 and HLA-DRB1 and IL2RA loci that have previously been demonstrated to protect against T1D in humans and mice providing validation for this method. Others are included in the well-established T1D GRS2 genetic risk scoring method. More intriguingly, other prioritized variants are completely novel and form the basis for future mechanistic and clinical validation studies The BMDC community-based platform can be expanded and repurposed to increase the accessibility, reproducibility, and productivity of biomedical information for diverse applications including the prioritization of cell type-specific disease alleles from complex phenotypes.

Highlights

  • The explosion over the past decade of high-throughput biomedical genomics data and the universal transition to electronic medical records promises unparalleled disease insights.[1,2,3] this promise has been hampered by problems in data-sharing and integration limiting the productivity and impact any individual biomedical dataset can generate

  • Biomedical Data Commons integrates multiple data types into a single graph Genomic, epigenomic, and transcriptomic data from eight databases have been integrated into a queryable knowledge graph (Figs 1A and 1B and S1, and S1 and S2 Tables)

  • Raw data was converted into Meta Content Format—mapping to ~50.7 billion unique entities and ~50.0 billion triples—and ingested into the Biomedical Data Commons

Read more

Summary

Introduction

The explosion over the past decade of high-throughput biomedical genomics data and the universal transition to electronic medical records promises unparalleled disease insights.[1,2,3] this promise has been hampered by problems in data-sharing and integration limiting the productivity and impact any individual biomedical dataset can generate. Genome-wide association studies (GWAS) seek to understand disease pathogenesis by correlating human sequence variation with disease phenotypes,[6,7] multiple hypothesis testing, linkage disequilibrium, and limited or heterogeneous disease populations limit the resolution of these studies. These issues lead to an overemphasis on variations with relatively rare minor allele frequencies in diseases. Merging multidimensional omics data with disease sequence variation would allow improved functional insights into associative data

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.