Abstract

There is an increasing need to use multivariate statistical methods for understanding biological functions, identifying the mechanisms of diseases, and exploring biomarkers. In addition to classical analyses such as hierarchical cluster analysis, principal component analysis, and partial least squares discriminant analysis, various multivariate strategies, including independent component analysis, non-negative matrix factorization, and multivariate curve resolution, have recently been proposed. However, determining the number of components is problematic. Despite the proposal of several different methods, no satisfactory approach has yet been reported. To resolve this problem, we implemented a new idea: classifying a component as “reliable” or “unreliable” based on the reproducibility of its appearance, regardless of the number of components in the calculation. Using the clustering method for classification, we applied this idea to multivariate curve resolution-alternating least squares (MCR-ALS). Comparisons between conventional and modified methods applied to proton nuclear magnetic resonance (1H-NMR) spectral datasets derived from known standard mixtures and biological mixtures (urine and feces of mice) revealed that more plausible results are obtained by the modified method. In particular, clusters containing little information were detected with reliability. This strategy, named “cluster-aided MCR-ALS,” will facilitate the attainment of more reliable results in the metabolomics datasets.

Highlights

  • There is an increasing need to use multivariate statistical methods for understanding biological functions, identifying the mechanisms of diseases, and exploring biomarkers

  • The multivariate curve resolutionalternating least squares (MCR-ALS) calculation was repeated with the number of components being changed for each calculation

  • For assignment of the optimum selected cluster size with objectivity, the maximum cluster size estimated from a dataset that had been randomized to destroy all biological information was set as a threshold size

Read more

Summary

Introduction

There is an increasing need to use multivariate statistical methods for understanding biological functions, identifying the mechanisms of diseases, and exploring biomarkers. “Omics” technologies, including genomics, transcriptomics, proteomics, and metabolomics/metabonomics, have been developed to obtain a bird’s-eye view of the underlying molecular networks in a cell or organism that elaborately regulate its complex biological responses[1,2] Cross-site analytical validity studies have been conducted, showing that interconvertibility of NMR data among different institutions is one of the great advantages of NMR-based approaches[11] This property is essential for the clinical application of metabolomics-derived biomarker discovery assisted by multivariate statistical approaches to the analysis of NMR datasets[12,13]. The MCR method is useful for resolving spectroscopic data featuring broad macromolecular peaks[23] and for estimating concentrations from metabolite mixture spectra[23] For use of these methods, determination of the number of components is the most important task. This inconsistency makes it difficult to use ICA/NMF/MCR, as using the wrong number of components in the analysis decreases the reliability of the results

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call