Abstract Background The rise of ‘big data’ in inflammatory bowel disease (IBD) presents an opportunity to improve understanding of pathogenesis and unpick the molecular complexity of this heterogenous condition. Personalisation of IBD management relies on predicting outcomes, response to therapy and prevention of complications. Here, we present results outlining subgrouping of patients and outcome prediction using multiomic/clinical data. Methods Using whole exome sequencing from 1100 patients in the Southampton IBD cohort, including 650 paediatric cases, we have performed iterative studies focused on 1) Impact of genomic variation across the NOD-signaling pathway measured by perturbation of transcription across multiple genes, 2) Development of NOD2 as a genomic biomarker of stricturing Crohn’s disease (CD), 3) Utilising machine learning and genomic data to develop disease classification models. These data utilise GenePy, a tool developed in house that summarises genomic variation to give a per individual, per gene deleteriousness metric. Results Within the NOD-signaling pathway patients harbouring deleterious variation in NOD2 had reduced NOD2 expression and increased NFKBIA expression, reflecting reduced NFKB signaling, figure 1A. We report deleterious variation in several key complexes including NOD2-RIPK2 and TAK1-TAB, resulted in reduced transcription of NFKB activators and alternative inflammatory pathway activation, figure 1C-D. Utilising genomic data we constructed a NOD2 prediction model for stricturing disease in Crohn’s disease; 56.7% of patients in the ‘high-risk group’ had stricturing behaviour, whilst in the low-risk group only 21.4% had strictures. Addition of terminal ileal (TI) disease to the NOD2 risk groups significantly improved prediction, figure 2A. Using survival modelling, high-risk group paediatric patients presenting with TI disease had a HR of 4.89 (P = 2.3×10-5) compared with the low-risk group patients without TI disease, figure 2B. Finally, we used supervised machine learning of genomic data to classify patients with CD or ulcerative colitis. We employed different gene lists and assessed how accurately we could assign patients to their diagnosis. An autoimmune gene panel produced the best model (AUROC 0.68), compared to an IBD panel (AUCROC 0.61). NOD2 was the most discriminating gene in all the gene panels. Conclusion These iterative projects demonstrate the utility of integrating genomic and clinical data to improve the subtyping of patients with IBD and provide disease prediction models. Future work will include analyses of additional inflammatory pathways and targeting different clinical outcomes. We hope clinical translation of these findings will be a step-change in precision medicine for IBD.
Read full abstract