Abstract

AbstractHaving large sets of predictors from multiple sources concerning the same observation units and the same criterion is becoming increasingly common in chemometrics. When analyzing such data, chemometricians often have multiple objectives: prediction of the criterion, variable selection, and identification of underlying processes associated to individual predictor sources or to several sources jointly. Existing methods offer solutions regarding the first two aims of uncovering the predictive mechanisms and relevant variables therein for a single block of predictor variables, but the challenge of uncovering joint and distinctive predictive mechanisms and the relevant variables therein in the multisource setting still needs to be addressed. To this end, we present a multiblock extension of principal covariates regression that aims to find the complex mechanisms in which several or single sources may be involved; taken together, these mechanisms predict an outcome of interest. We call this method sparse common and distinctive covariates regression (SCD‐CovR). Through a simulation study, we demonstrate that SCD‐CovR provides competitive solutions when compared with related methods. The method is also illustrated via an application to a publicly available dataset.

Highlights

  • When predicting an outcome by a number of predictor variables, there often is the additional aim to obtain insight in the mechanisms at play

  • We evaluate the performance of SCD-CovR by comparing it with other methods that are characterized by similar goals such as sparse generalized canonical correlation analysis (SGCCA) that is based on partial least squares (PLS).[17]

  • Inspecting the weights matrix produced by the two outperforming methods, we found that sparse PCovR (SPCovR) produced two common components and one component distinctive to the chemical block, whereas SCD-CovR found one common component and one distinctive component for each predictor block

Read more

Summary

Introduction

When predicting an outcome by a number of predictor variables, there often is the additional aim to obtain insight in the mechanisms at play. To obtain an even deeper understanding of the system under study often, large and heterogeneous collections of data are used, which results in several blocks of predictors pertaining to the same observation units. These are used to obtain a better understanding of disease mechanisms by jointly studying several features of the biological system (e.g., genomic, transcriptomic, and proteomic data collected from the same sample of patients and controls).[2] Obtaining insights from such large multiblock data implies revealing (1) the relevant features in the system and (2) the orchestration of the system

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.