Abstract

Integration of multiomics data remains a key challenge in fulfilling the potential of comprehensive systems biology. Multiple-block orthogonal projections to latent structures (OnPLS) is a projection method that simultaneously models multiple data matrices, reducing feature space without relying on a priori biological knowledge. In order to improve the interpretability of OnPLS models, the associated multi-block variable influence on orthogonal projections (MB-VIOP) method is used to identify variables with the highest contribution to the model. This study combined OnPLS and MB-VIOP with interactive visualization methods to interrogate an exemplar multiomics study, using a subset of 22 individuals from an asthma cohort. Joint data structure in six data blocks was assessed: transcriptomics; metabolomics; targeted assays for sphingolipids, oxylipins, and fatty acids; and a clinical block including lung function, immune cell differentials, and cytokines. The model identified seven components, two of which had contributions from all blocks (globally joint structure) and five that had contributions from two to five blocks (locally joint structure). Components 1 and 2 were the most informative, identifying differences between healthy controls and asthmatics and a disease–sex interaction, respectively. The interactions between features selected by MB-VIOP were visualized using chord plots, yielding putative novel insights into asthma disease pathogenesis, the effects of asthma treatment, and biological roles of uncharacterized genes. For example, the gene ATP6 V1G1, which has been implicated in osteoporosis, correlated with metabolites that are dysregulated by inhaled corticoid steroids (ICS), providing insight into the mechanisms underlying bone density loss in asthma patients taking ICS. These results show the potential for OnPLS, combined with MB-VIOP variable selection and interaction visualization techniques, to generate hypotheses from multiomics studies and inform biology.

Highlights

  • Article using classical univariate statistical methods, machine-learning techniques have become routinely used to interrogate and understand vast amounts of data.[3,4] Two common characteristics of -omics data are that the number of measured variables is vastly greater than the number of observations[5] and that there is a degree of multicollinearity between variables.[6]

  • Multivariate projection methods such as orthogonal projection to latent structures discriminant analysis (OPLS-DA) have proven successful in modeling the underlying latent biological structure within a single high dimensional data block; they are theoretically unsuitable for modeling multiple data blocks simultaneously

  • Galindo-Prieto et al adapted the variable influence on projection (VIP) concept for multi-block data analysis to identify the variables that contribute to these different levels of joint structure

Read more

Summary

Introduction

Article using classical univariate statistical methods, machine-learning techniques have become routinely used to interrogate and understand vast amounts of data.[3,4] Two common characteristics of -omics data are that the number of measured variables is vastly greater than the number of observations[5] and that there is a degree of multicollinearity between variables.[6]. To address the need for multivariate methods to simultaneously model multiple data matrices, a number of multi-block data integration methods have been proposed.[21−23] In 2011, Löfstedt and Trygg[24] proposed a novel multi-block multivariate method called OnPLS, which utilizes the framework of OPLS to decompose data from more than two input matrices Multiblock models, such as OnPLS, are fully symmetric, meaning each data block is weighted to allow an equal contribution to the model, regardless of the number of variables or underlying data structure within each block.[25] Multi-block approaches offer further advantages over single block or block concatenation in biomarker discovery. Clinical classification and enrollment criteria were previously described.[29,31] Participant data were included in the present study if they were classified as either healthy control or severe asthmatic individuals in the existing cohort, and data from all data blocks (described ) were collected

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call