Abstract

BackgroundFor multivariate data analysis involving only two input matrices (e.g., X and Y), the previously published methods for variable influence on projection (e.g., VIPOPLS or VIPO2PLS) are widely used for variable selection purposes, including (i) variable importance assessment, (ii) dimensionality reduction of big data and (iii) interpretation enhancement of PLS, OPLS and O2PLS models. For multiblock analysis, the OnPLS models find relationships among multiple data matrices (more than two blocks) by calculating latent variables; however, a method for improving the interpretation of these latent variables (model components) by assessing the importance of the input variables was not available up to now.ResultsA method for variable selection in multiblock analysis, called multiblock variable influence on orthogonal projections (MB-VIOP) is explained in this paper. MB-VIOP is a model based variable selection method that uses the data matrices, the scores and the normalized loadings of an OnPLS model in order to sort the input variables of more than two data matrices according to their importance for both simplification and interpretation of the total multiblock model, and also of the unique, local and global model components separately. MB-VIOP has been tested using three datasets: a synthetic four-block dataset, a real three-block omics dataset related to plant sciences, and a real six-block dataset related to the food industry.ConclusionsWe provide evidence for the usefulness and reliability of MB-VIOP by means of three examples (one synthetic and two real-world cases). MB-VIOP assesses in a trustable and efficient way the importance of both isolated and ranges of variables in any type of data. MB-VIOP connects the input variables of different data matrices according to their relevance for the interpretation of each latent variable, yielding enhanced interpretability for each OnPLS model component. Besides, MB-VIOP can deal with strong overlapping of types of variation, as well as with many data blocks with very different dimensionality. The ability of MB-VIOP for generating dimensionality reduced models with high interpretability makes this method ideal for big data mining, multi-omics data integration and any study that requires exploration and interpretation of large streams of data.

Highlights

  • For multivariate data analysis involving only two input matrices (e.g., X and Y), the previously published methods for variable influence on projection (e.g., ­VIPOPLS or ­VIPO2PLS) are widely used for variable selection purposes, including (i) variable importance assessment, (ii) dimensionality reduction of big data and (iii) interpreta‐ tion enhancement of partial least squares (PLS), Orthogonal projections to latent structures (OPLS) and 2-Block orthogonal projections to latent structures (O2PLS) models

  • The results and the discussion aim to validate the multiblock variable influence on orthogonal projections (MB-VIOP) method for its application in orthogonal projections to latent structures (OnPLS) models

  • MB-VIOP uses the symmetry of OnPLS for establishing fairer relationships/influences between variables of different blocks iterating over all components and all blocks, i.e. considering all combinations

Read more

Summary

Introduction

For multivariate data analysis involving only two input matrices (e.g., X and Y), the previously published methods for variable influence on projection (e.g., ­VIPOPLS or ­VIPO2PLS) are widely used for variable selection purposes, including (i) variable importance assessment, (ii) dimensionality reduction of big data and (iii) interpreta‐ tion enhancement of PLS, OPLS and O2PLS models. Multiblock methods based on orthogonal projections have received interest within life-sciences provided the model structure it can decompose the data blocks into; two examples of this are the multi-omics factor analysis (MOFA) presented by Argelaguet et al in 2018 [33] and the N-block orthogonal projections to latent structures (OnPLS) method presented by Löfstedt and Trygg in 2011 [34]. The latter can be used to provide some input parameters for improved model interpretation using MB-VIOP. OnPLS provides means to take full advantage of the shared and unique variations of more than two data blocks

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.