Abstract

AbstractMultiblock analysis attacks the problem of how to combine data from various data sources for purposes such as prediction, classification, clustering, or visual data analysis. A key concept is the distinction between “common” and “distinct” parts, that is, what information repeats itself across the blocks and what is unique to an individual block.The statistical field of multiblock analysis holds many different approaches, which leads to different treatments both of the terms distinct and common themselves and to differences in the numerical results. In this article, we extend the discussion of distinct and common in multiblock analysis to the domain of distance matrices, that is, the situation where data point sets, so‐called configurations, are analyzed via relative distances either because configurations are not available directly or because a distance representation is favorable. Situations typical for chemometrics will be highlighted and illustrated in examples.When analyzing different methods, we have focused on three key aspects. First, during the transition from the distance to configuration domains, one needs to consider how multiple distance matrices are treated. Second, when extracting common and distinct parts, one needs to manage a tradeoff between explaining variance and ensuring similarity between subspaces. Third, there is a design choice to be made as to whether the subspace containing the common parts is “shared” between blocks or if separate subspaces are associated with each individual block. The three aspects help to categorize and explain well‐known methods in the field. A selection of methods was analyzed and subsequently applied to examples.

Highlights

  • Distance data are relevant in several domains and have been used extensively in psychology and sociology based on notions of “similar” and “dissimilar,” or rankings, to quantify the distance between sets of concepts, categories, samples, and so forth.[1,2] A similar application occurs in sensory analysis where distances between products, for instance wines, are used to map these onto a sensory map and using frequencies of word descriptions to interpret the meaning of the coordinate axes.[3]

  • We have focused on two methods: INDSCAL with constrained version of MDS (CMDS); and multidimensional scaling (MDS) first followed by GCA

  • In the field of multiblock analysis, there is a large literature on the extraction of common and distinct components

Read more

Summary

Introduction

Distance data are relevant in several domains and have been used extensively in psychology and sociology based on notions of “similar” and “dissimilar,” or rankings, to quantify the distance between sets of concepts, categories, samples, and so forth.[1,2] A similar application occurs in sensory analysis where distances between products, for instance wines, are used to map these onto a sensory map and using frequencies of word descriptions to interpret the meaning of the coordinate axes.[3]. Even when the original representation is not in the form of distance data, it may be convenient to use distances in some analyses. When fusing data sources of very different formats due to differing dimension or to variables being of different types such as binary and continuous.[5,6] Another example is when prior information is most incorporated in the form of UniFrac[7] distance matrices, such as phylogenetic information about microbial species

Objectives
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.