Abstract

BackgroundOne goal of multi-omic studies is to identify interpretable predictive models for outcomes of interest, with analytes drawn from multiple omes. Such findings could support refined biological insight and hypothesis generation. However, standard analytical approaches are not designed to be “ome aware.” Thus, some researchers analyze data from one ome at a time, and then combine predictions across omes. Others resort to correlation studies, cataloging pairwise relationships, but lacking an obvious approach for cohesive and interpretable summaries of these catalogs.MethodsWe present a novel workflow for building predictive regression models from network neighborhoods in multi-omic networks. First, we generate pairwise regression models across all pairs of analytes from all omes, encoding the resulting “top table” of relationships in a network. Then, we build predictive logistic regression models using the analytes in network neighborhoods of interest. We call this method CANTARE (Consolidated Analysis of Network Topology And Regression Elements).ResultsWe applied CANTARE to previously published data from healthy controls and patients with inflammatory bowel disease (IBD) consisting of three omes: gut microbiome, metabolomics, and microbial-derived enzymes. We identified 8 unique predictive models with AUC > 0.90. The number of predictors in these models ranged from 3 to 13. We compare the results of CANTARE to random forests and elastic-net penalized regressions, analyzing AUC, predictions, and predictors. CANTARE AUC values were competitive with those generated by random forests and penalized regressions. The top 3 CANTARE models had a greater dynamic range of predicted probabilities than did random forests and penalized regressions (p-value = 1.35 × 10–5). CANTARE models were significantly more likely to prioritize predictors from multiple omes than were the alternatives (p-value = 0.005). We also showed that predictive models from a network based on pairwise models with an interaction term for IBD have higher AUC than predictive models built from a correlation network (p-value = 0.016). R scripts and a CANTARE User’s Guide are available at https://sourceforge.net/projects/cytomelodics/files/CANTARE/.ConclusionCANTARE offers a flexible approach for building parsimonious, interpretable multi-omic models. These models yield quantitative and directional effect sizes for predictors and support the generation of hypotheses for follow-up investigation.

Highlights

  • One goal of multi-omic studies is to identify interpretable predic‐ tive models for outcomes of interest, with analytes drawn from multiple omes

  • Though the exact mechanisms underlying the disease pathogenesis are not fully understood, recent studies have found a number of environmental factors including diet, medications, and the gut microbiota that can trigger an overactive mucosal immune response in the host, and have been linked to increasing inflammatory bowel disease (IBD) prevalence [32]

  • The current diagnostic method for IBD consists of a combination of a detailed history assessment, physical and laboratory examination, esophagogastroduodenoscopy, ileo-colonoscopy combined with histology, and imaging of the small bowel [33,34,35]

Read more

Summary

Introduction

One goal of multi-omic studies is to identify interpretable predic‐ tive models for outcomes of interest, with analytes drawn from multiple omes. Franzosa et al analyzed the gut microbiome, microbial metagenome, and metabolome in inflammatory bowel disease (IBD) which includes Crohn’s disease and ulcerative colitis [3] Analyzing these high dimensional and complex data sets to identify and visualize tractable multi-omic patterns remains a challenge. Some researchers analyze data from one ome at a time, and combine predictions across omes, perhaps using a weighted average or other ensemble [1, 4, 5] This approach is sometimes called late integration [6]. Promising analytes from one ome (e.g.transcripts or metabolites) can be further characterized with pathway analysis [2] or functional enrichment [13] These correlation studies may not account for differing relationships by disease state

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call