ABSTRACTSample collection can significantly affect lipid concentration measurements in cell line panels, concealing intrinsic differences between cancer subtypes. Most quality control steps in lipidomic data analysis focus on controlling technical variation. Correcting for the total amount of biological material remains an additional challenge for cell line panels. Here, we investigated how we can normalize lipidomic data acquired from multiple cell lines to correct for differences in sample biomass. We studied how commonly used data normalization and transformation strategies influence the resulting lipid data distributions. We compared normalization by biological properties including cell count and total protein concentration, to statistical and data‐based approaches, such as median, mean, or probabilistic quotient‐based normalization. We used intraclass correlations to estimate how normalization influenced the similarity between replicates. Normalizing lipidomic data by cell count improved the similarity between replicates but only for cell lines with similar morphologies. When comparing cell line panels with diverse morphologies neither cell count nor protein concentration was sufficient to increase the similarity of lipid abundances between cell line replicates. Data‐based normalizations increased these similarities but resulted in a bias towards the large and variable lipid class of triglycerides. These artifacts are reduced by normalizing for the abundance of only structural lipids. We conclude that there is a delicate balance between improving the similarity between replicates and avoiding artifacts in lipidomic data and emphasize the importance of an appropriate normalization strategy in studying biological phenomena using lipidomics.
Read full abstract