Normalization methods for reducing interbatch effect without quality control samples in liquid chromatography-mass spectrometry-based studies.

Alisa O Tokareva,Alexey S Kononikhin,Natalia L Starodubtseva,Vitaliy V Chagovets,Eugene N Nikolaev,Vladimir E Frankevich

doi:10.1007/s00216-021-03294-8

Abstract

Data normalization is an essential part of a large-scale untargeted mass spectrometry metabolomics analysis. Autoscaling, Pareto scaling, range scaling, and level scaling methods for liquid chromatography-mass spectrometry data processing were compared with the most common normalization methods, including quantile normalization, probabilistic quotient normalization, and variance stabilizing normalization. These methods were tested on eight datasets from various clinical studies. The efficiency of the data normalization was assessed by the distance between clusters corresponding to batches and the distance between clusters corresponding to clinical groups in the space of principal components, as well as by the number of features with a pairwise statistically significant difference between the batches and the number of features with a pairwise statistically significant difference between clinical groups. Autoscaling demonstrated the most effective reduction in interbatch variation and can be preferable to probabilistic quotient or quantile normalization in liquid chromatography-mass spectrometry data.

Full Text