Abstract

Abstract Lack of available multiomics datasets has slowed progress in understanding the relationship between genomic variants, gene expression, and disease progress in inflammatory bowel disease (IBD). Biobanks provide an opportunity to produce population-scale datasets to accelerate understanding of these diseases. Ovation’s IBD Omic Data is a multiomic dataset including whole-genome sequencing (WGS) and RNA sequencing (RNA-seq) from disease and normal tissues, derived from clinical biopsies of patients with IBD. Omic data is supplemented with patient phenotypic data including diagnoses, therapies, procedures, and routine laboratory results. We report initial results of analysis of the first 212 patients from this cohort to identify genes differentially expressed between disease and normal colon tissue in patients with IBD, and to explore variants in these genes in both the Ovation IBD Omic Data and UK Biobank (UKBB) genomic data. To generate Ovation IBD Omic Data, DNA and RNA were extracted from fixed formalin paraffin embedded (FFPE) tissue samples of 212 IBD patients. Sequencing was conducted at a depth of 30X WGS. RNA-seq analysis was performed at a depth of 30M 1x50-bp reads per sample. Subsequent analysis was done via STAR aligner, FeatureCounts, and DESeq2. We curated a list of 54 common genes implicated in IBD from the literature as a benchmark. WGS data in the Variant Call Format (VCF) were ingested into the REVEAL platform along with phenotypic data variables. Values for minor allele frequencies (maf) were calculated and annotations were generated using the Ensembl Variant Effect Predictor (VEP). We examined the number of unique genomic variants in coding regions of the Ovation IBD Omic Data compared to those from the UKBB dataset for the benchmark genes. Despite the difference in sample population size, the Ovation data included a median of ~3,600 more unique variants per gene (~85% had more variants identified in a population of 212). Analysis of the global gene expression levels from Ovation’s IBD dataset showed 18 genes from the benchmark had significantly different expression in disease and normal tissue (p-value <0.001) across all IBD patients. We also analyzed the subset of samples from patients with CD (n=58). Notably, SLC4A4 and IFITM3 did not pass the original threshold in the full analysis (p-values were >0.001), however they were statistically significant when the analysis was limited to CD patients (p-value <0.0001). Comparing datasets of varying complexity shows that, despite a smaller sample size, specialized data derived from clinical samples, such as the Ovation IBD Omic Data, can provide additional insights for complex diseases like IBD. Further analysis will focus on the relationship between unique genomic variants identified, expression profiles and associated metadata to identify candidates of potential clinical relevance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call