A biplot correlation range for group-wise metabolite selection in mass spectrometry

Youngja H Park,Kichun Lee,Taewoon Kong,Dean P Jones,James R Roede

doi:10.1186/s13040-019-0191-2

Youngja H Park, Kichun Lee + Show 3 more

Open Access

PDF Available

https://doi.org/10.1186/s13040-019-0191-2

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundAnalytic methods are available to acquire extensive metabolic information in a cost-effective manner for personalized medicine, yet disease risk and diagnosis mostly rely upon individual biomarkers based on statistical principles of false discovery rate and correlation. Due to functional redundancies and multiple layers of regulation in complex biologic systems, individual biomarkers, while useful, are inherently limited in disease characterization. Data reduction and discriminant analysis tools such as principal component analysis (PCA), partial least squares (PLS), or orthogonal PLS (O-PLS) provide approaches to separate the metabolic phenotypes, but do not offer a statistical basis for selection of group-wise metabolites as contributors to metabolic phenotypes.MethodsWe present a dimensionality-reduction based approach termed ‘biplot correlation range (BCR)’ that uses biplot correlation analysis with direct orthogonal signal correction and PLS to provide the group-wise selection of metabolic markers contributing to metabolic phenotypes.ResultsUsing a simulated multiple-layer system that often arises in complex biologic systems, we show the feasibility and superiority of the proposed approach in comparison of existing approaches based on false discovery rate and correlation. To demonstrate the proposed method in a real-life dataset, we used LC-MS based metabolomics to determine spectrum of metabolites present in liver mitochondria from wild-type (WT) mice and thioredoxin-2 transgenic (TG) mice. We select discriminatory variables in terms of increased score in the direction of class identity using BCR. The results show that BCR provides means to identify metabolites contributing to class separation in a manner that a statistical method by false discovery rate or statistical total correlation spectroscopy can hardly find in complex data analysis for predictive health and personalized medicine.

Highlights

Contemporary analytic methods, such as liquid chromatography-mass spectrometry (LC-MS) [1, 2], gas chromatography-mass spectrometry (GC-MS) [3, 4], and proton nuclear magnetic resonance (1H NMR) spectroscopy [5, 6], provide information-rich data sets that can be of substantial value in biomedical research and, in principle, can be developed with bioinformatics procedures for routine healthcare [7,8,9]
We explore whether biplot correlation range (BCR) would determine a correlation range using scores and loadings in principal component analysis (PCA), extending them to partial least squares (PLS) and orthogonal-signal-correction PLS (OPLS), and biomarkers for the purpose of discrimination analysis of mass spectral data from mitochondria isolated from wild-type (WT) mice and thioredoxin-2 (Trx2) TG mice
We developed a dimensionality-reduction based approach termed a biplot correlation range that improves reliability of selection of metabolites contributing to group behavior for use in metabolic profiling applications for personalized medicine

Summary

Introduction

Contemporary analytic methods, such as liquid chromatography-mass spectrometry (LC-MS) [1, 2], gas chromatography-mass spectrometry (GC-MS) [3, 4], and proton nuclear magnetic resonance (1H NMR) spectroscopy [5, 6], provide information-rich data sets that can be of substantial value in biomedical research and, in principle, can be developed with bioinformatics procedures for routine healthcare [7,8,9]. Recent introduction of adaptive processing by apLCMS [10] provides a systematic approach to reduce noise and extract relative quantification of > 7000 metabolic features in 50 aliquots of human plasma in 20 min (2); current improvements in data processing have demonstrated that > 12,000 metabolic features can be extracted [11]. This high volume of information, which is inherently multivariate, presents challenges to reliable use in health prediction and disease management. Data reduction and discriminant analysis tools such as principal component analysis (PCA), partial least squares (PLS), or orthogonal PLS (O-PLS) provide approaches to separate the metabolic phenotypes, but do not offer a statistical basis for selection of group-wise metabolites as contributors to metabolic phenotypes

Methods

Results

Conclusion