Performance of methods that separate common and distinct variation in multiple data blocks

Ingrid Måge,Frans M Van Der Kloet,Age K Smilde

doi:10.1002/cem.3085

Abstract

AbstractIn many areas of science, multiple sets of data are collected from the samples. Such data sets can be analyzed by multiblock (or data fusion) methods. The aim is usually to get a holistic understanding of the system or better prediction of some response. Lately, several scientific groups have developed methods for separating common and distinct variation between multiple data blocks. Although the objective is the same, the strategies and algorithms are completely different for these methods. In this paper, we investigate the practical properties of the four most popular methods for separating common and distinct variation: JIVE, DISCO, PCA‐GCA, and OnPLS. The main barrier complicating the use of any of these methods is model selection and validation. Especially when the numbers of blocks is more than two. By the use of extensive simulations, we have elucidated the three properties that are important for assessing the validity of the results: The ability to identify the correct model, the ability to estimate the true, underlying subspaces, and the robustness towards misspecification of the model.The simulated data sets mimic a range of “real life” data, with different dimensionalities and variance structures. We are thus able to identify which methods work best for different types of data structures, and pinpoint weak spots for each method. The results show that PCA‐GCA works best for model selection, while JIVE and DISCO give the best estimates of the subspaces and are most robust towards model misspecification.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Performance of methods that separate common and distinct variation in multiple data blocks

Abstract

Talk to us

Similar Papers

More From: Journal of Chemometrics

Lead the way for us

Journal: Journal of Chemometrics	Publication Date: Oct 9, 2018
Citations: 19

Similar Papers

Common and distinct components in data fusion
Age K Smilde ... Ingrid Måge
Journal of Chemometrics | VOL. 31
Age K Smilde, et. al.Age K Smilde ... Ingrid Måge
31 May 2017
Journal of Chemometrics | VOL. 31

Sequence Kernel Association Tests for the Combined Effect of Rare and Common Variants
Iuliana Ionita-Laza ... Xihong Lin
The American Journal of Human Genetics | VOL. 92
Iuliana Ionita-Laza, et. al.Iuliana Ionita-Laza ... Xihong Lin
16 May 2013
The American Journal of Human Genetics | VOL. 92

Dealing with data conflicts in statistical inference of population assessment models that integrate information from multiple diverse data sets
Mark N Maunder ... Kevin R Piner
Fisheries Research | VOL. 192
Mark N Maunder, et. al.Mark N Maunder ... Kevin R Piner
10 Feb 2017
Fisheries Research | VOL. 192

NDT Data Fusion in the Aerospace Industry
James M. Nelson ... Richard H. Bossi
-
James M. Nelson, et. al.James M. Nelson ... Richard H. Bossi
01 Jan 2001
01 Jan 2001

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance of methods that separate common and distinct variation in multiple data blocks

Abstract

Talk to us

Similar Papers

More From: Journal of Chemometrics