Unit variance (UV) scaling, mean centering (CTR) scaling, and Pareto (Par) scaling are three commonly used algorithms in the preprocessing of metabolomics data. Based on our NMR-based metabolomics studies, we found that the clustering identification performances of these three scaling methods were dramatically different as tested by the spectra data of 48 young athletes' urine samples, spleen tissue (from mice), serum (from mice), and cell (from Staphylococcus aureus) samples. Our data suggested that for the extraction of clustering information, UV scaling could serve as a robust approach for NMR metabolomics data for the identification of clustering analysis even with the existence of technical errors. However, for the purpose of discriminative metabolite identification, UV scaling, CTR scaling, and Par scaling could equally extract discriminative metabolites efficiently based on the coefficient values. Based on the data presented in this study, we propose an optimal working pipeline for the selection of scaling algorithms in NMR-based metabolomics analysis, which has the potential to serve as guidance for junior researchers working in the NMR-based metabolomics research field.
Read full abstract