Abstract
AbstractIn many areas of science, multiple sets of data are collected from the samples. Such data sets can be analyzed by multiblock (or data fusion) methods. The aim is usually to get a holistic understanding of the system or better prediction of some response. Lately, several scientific groups have developed methods for separating common and distinct variation between multiple data blocks. Although the objective is the same, the strategies and algorithms are completely different for these methods. In this paper, we investigate the practical properties of the four most popular methods for separating common and distinct variation: JIVE, DISCO, PCA‐GCA, and OnPLS. The main barrier complicating the use of any of these methods is model selection and validation. Especially when the numbers of blocks is more than two. By the use of extensive simulations, we have elucidated the three properties that are important for assessing the validity of the results: The ability to identify the correct model, the ability to estimate the true, underlying subspaces, and the robustness towards misspecification of the model.The simulated data sets mimic a range of “real life” data, with different dimensionalities and variance structures. We are thus able to identify which methods work best for different types of data structures, and pinpoint weak spots for each method. The results show that PCA‐GCA works best for model selection, while JIVE and DISCO give the best estimates of the subspaces and are most robust towards model misspecification.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.