Abstract

Technologies for profiling samples using different omics platforms have been at the forefront since the human genome project. Large-scale multi-omics data hold the promise of deciphering different regulatory layers. Yet, while there is a myriad of bioinformatics tools, each multi-omics analysis appears to start from scratch with an arbitrary decision over which tools to use and how to combine them. Therefore, it is an unmet need to conceptualize how to integrate such data and implement and validate pipelines in different cases. We have designed a conceptual framework (STATegra), aiming it to be as generic as possible for multi-omics analysis, combining available multi-omic anlaysis tools (machine learning component analysis, non-parametric data combination, and a multi-omics exploratory analysis) in a step-wise manner. While in several studies, we have previously combined those integrative tools, here, we provide a systematic description of the STATegra framework and its validation using two The Cancer Genome Atlas (TCGA) case studies. For both, the Glioblastoma and the Skin Cutaneous Melanoma (SKCM) cases, we demonstrate an enhanced capacity of the framework (and beyond the individual tools) to identify features and pathways compared to single-omics analysis. Such an integrative multi-omics analysis framework for identifying features and components facilitates the discovery of new biology. Finally, we provide several options for applying the STATegra framework when parametric assumptions are fulfilled and for the case when not all the samples are profiled for all omics. The STATegra framework is built using several tools, which are being integrated step-by-step as OpenSource in the STATegRa Bioconductor package.1

Highlights

  • Computational and experimental developments have enabled the profiling of multiple layers of cell regulation: genome, transcriptome, epigenome, chromatin conformation or metabolome, among many globally known “omics” (Ramos et al, 2017; Gomez-Cabrero et al, 2019)

  • While every multiomics data combination is different, we believe that a general framework is key to gain knowledge for an “optimized” integrated research analysis in the future

  • We here present the STATegra framework, a multi-omics integrative pipeline, the result of integrative analyses done over the last decade (Karathanasis et al, 2016; Carlström et al, 2019; Ewing et al, 2019, 2020; Fernandes et al, 2019)

Read more

Summary

Introduction

Computational and experimental developments have enabled the profiling of multiple layers of cell regulation: genome, transcriptome, epigenome, chromatin conformation or metabolome, among many globally known “omics” (Ramos et al, 2017; Gomez-Cabrero et al, 2019). We introduce the STATegra framework, in which we integrate three multi-omics based approaches into a single pipeline: (a) Component Analysis (CA) to understand the coordination among omics data-types (Måge et al, 2019); (b) Non-Parametric Combination (NPC) analysis to leverage on paired designs to increase statistical power (Karathanasis et al, 2016); and (c) an integrative exploratory analysis (Ewing et al, 2020). This framework may be extended by including additional tools such as network analysis (Barabási et al, 2011; Yugi et al, 2016). We incorporated most of these tools into the STATegRa Bioconductor package to facilitate their use. The package is continuously being updated

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call