Abstract

IntroductionThe generic metabolomics data processing workflow is constructed with a serial set of processes including peak picking, quality assurance, normalisation, missing value imputation, transformation and scaling. The combination of these processes should present the experimental data in an appropriate structure so to identify the biological changes in a valid and robust manner.ObjectivesCurrently, different researchers apply different data processing methods and no assessment of the permutations applied to UHPLC-MS datasets has been published. Here we wish to define the most appropriate data processing workflow.MethodsWe assess the influence of normalisation, missing value imputation, transformation and scaling methods on univariate and multivariate analysis of UHPLC-MS datasets acquired for different mammalian samples.ResultsOur studies have shown that once data are filtered, missing values are not correlated with m/z, retention time or response. Following an exhaustive evaluation, we recommend PQN normalisation with no missing value imputation and no transformation or scaling for univariate analysis. For PCA we recommend applying PQN normalisation with Random Forest missing value imputation, glog transformation and no scaling method. For PLS-DA we recommend PQN normalisation, KNN as the missing value imputation method, generalised logarithm transformation and no scaling. These recommendations are based on searching for the biologically important metabolite features independent of their measured abundance.ConclusionThe appropriate choice of normalisation, missing value imputation, transformation and scaling methods differs depending on the data analysis method and the choice of method is essential to maximise the biological derivations from UHPLC-MS datasets.Electronic supplementary materialThe online version of this article (doi:10.1007/s11306-016-1030-9) contains supplementary material, which is available to authorized users.

Highlights

  • Introduction The generic metabolomics data processing workflow is constructed with a serial set of processes including peak picking, quality assurance, normalisation, missing value imputation, transformation and scaling

  • The appropriate choice of normalisation, missing value imputation, transformation and scaling methods differs depending on the data analysis method and the choice of method is essential to maximise the biological derivations from UHPLC-MS datasets

  • The percentage of missing values was calculated and an assessment to determine whether missing values were correlated with m/z, retention time or response was performed for four different datasets acquired applying two different analytical methods and three different UHPLC-MS platforms (Accela UHPLC coupled to LTQ-Orbitrap Velos, Ultimate3000 coupled to LTQ-FT Ultra, and Ultimate3000 coupled to Q Exactive)

Read more

Summary

Introduction

The generic metabolomics data processing workflow is constructed with a serial set of processes including peak picking, quality assurance, normalisation, missing value imputation, transformation and scaling. The combination of these processes should present the experimental data in an appropriate structure so to identify the biological changes in a valid and robust manner. Following the acquisition of threedimensional raw data (m/z vs retention time vs response), the first process to convert this raw data to biological knowledge is peak picking (or deconvolution) to align and integrate data across multiple samples Software such as XCMS (Smith et al 2006) and mzMine (Katajamaa et al 2006) are freely available and commonly applied. A number of papers do not even define which processing methods were applied

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.