Abstract
Several metabolomic software programs provide methods for peak picking, retention time alignment and quantification of metabolite features in LC/MS-based metabolomics. Statistical analysis, however, is needed in order to discover those features significantly altered between samples. By comparing the retention time and MS/MS data of a model compound to that from the altered feature of interest in the research sample, metabolites can be then unequivocally identified. This paper reports on a comprehensive overview of a workflow for statistical analysis to rank relevant metabolite features that will be selected for further MS/MS experiments. We focus on univariate data analysis applied in parallel on all detected features. Characteristics and challenges of this analysis are discussed and illustrated using four different real LC/MS untargeted metabolomic datasets. We demonstrate the influence of considering or violating mathematical assumptions on which univariate statistical test rely, using high-dimensional LC/MS datasets. Issues in data analysis such as determination of sample size, analytical variation, assumption of normality and homocedasticity, or correction for multiple testing are discussed and illustrated in the context of our four untargeted LC/MS working examples.
Highlights
The comprehensive detection and quantification of metabolites in biological systems, coined as ‘metabolomics’, offers a new approach to interrogate mechanistic biochemistry related to natural processes such as health and disease
Relevant mzRT features for mass spectrometry (MS)/MS identification are typically selected based on statistics criteria, either by multivariate data analysis or multiple independent univariate tests
This paper aims to investigate the impact of univariate statistical issues on LC/MS-based metabolomic experiments, in small, focused studies
Summary
The comprehensive detection and quantification of metabolites in biological systems, coined as ‘metabolomics’, offers a new approach to interrogate mechanistic biochemistry related to natural processes such as health and disease. Database matching represents only a putative metabolite assignment that must be confirmed by comparing the retention time and/or MS/MS data of a model pure compound to that from the feature of interest in the research sample These additional analyses are time consuming and represent the rate-limiting step of the untargeted metabolomic workflow. Relevant mzRT features for MS/MS identification are typically selected based on statistics criteria, either by multivariate data analysis or multiple independent univariate tests. This paper aims to investigate the impact of univariate statistical issues on LC/MS-based metabolomic experiments, in small, focused studies (e.g., small clinical trials or animal studies) To this end, here we explore the nature of four real and independent datasets, evaluate the challenges and limitations of executing multiple univariate tests and illustrate available shortcuts. All methods described in this paper are based on scripts programmed either in MATLABTM (Mathworks, Natick, MA) or R [13]
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have