Abstract
There is an increasing need to use multivariate statistical methods for understanding biological functions, identifying the mechanisms of diseases, and exploring biomarkers. In addition to classical analyses such as hierarchical cluster analysis, principal component analysis, and partial least squares discriminant analysis, various multivariate strategies, including independent component analysis, non-negative matrix factorization, and multivariate curve resolution, have recently been proposed. However, determining the number of components is problematic. Despite the proposal of several different methods, no satisfactory approach has yet been reported. To resolve this problem, we implemented a new idea: classifying a component as “reliable” or “unreliable” based on the reproducibility of its appearance, regardless of the number of components in the calculation. Using the clustering method for classification, we applied this idea to multivariate curve resolution-alternating least squares (MCR-ALS). Comparisons between conventional and modified methods applied to proton nuclear magnetic resonance (1H-NMR) spectral datasets derived from known standard mixtures and biological mixtures (urine and feces of mice) revealed that more plausible results are obtained by the modified method. In particular, clusters containing little information were detected with reliability. This strategy, named “cluster-aided MCR-ALS,” will facilitate the attainment of more reliable results in the metabolomics datasets.
Highlights
There is an increasing need to use multivariate statistical methods for understanding biological functions, identifying the mechanisms of diseases, and exploring biomarkers
The multivariate curve resolutionalternating least squares (MCR-ALS) calculation was repeated with the number of components being changed for each calculation
For assignment of the optimum selected cluster size with objectivity, the maximum cluster size estimated from a dataset that had been randomized to destroy all biological information was set as a threshold size
Summary
There is an increasing need to use multivariate statistical methods for understanding biological functions, identifying the mechanisms of diseases, and exploring biomarkers. “Omics” technologies, including genomics, transcriptomics, proteomics, and metabolomics/metabonomics, have been developed to obtain a bird’s-eye view of the underlying molecular networks in a cell or organism that elaborately regulate its complex biological responses[1,2] Cross-site analytical validity studies have been conducted, showing that interconvertibility of NMR data among different institutions is one of the great advantages of NMR-based approaches[11] This property is essential for the clinical application of metabolomics-derived biomarker discovery assisted by multivariate statistical approaches to the analysis of NMR datasets[12,13]. The MCR method is useful for resolving spectroscopic data featuring broad macromolecular peaks[23] and for estimating concentrations from metabolite mixture spectra[23] For use of these methods, determination of the number of components is the most important task. This inconsistency makes it difficult to use ICA/NMF/MCR, as using the wrong number of components in the analysis decreases the reliability of the results
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.