CytoNorm: A Normalization Algorithm for Cytometry Data.

Sofie Van Gassen,Nima Aghaeepour,Yvan Saeys,Martin S Angst,Brice Gaudilliere

doi:10.1002/cyto.a.23904

Sofie Van Gassen, Nima Aghaeepour + Show 3 more

Open Access

https://doi.org/10.1002/cyto.a.23904

Copy DOI

Journal: Cytometry Part A	Publication Date: Oct 21, 2019
Citations: 135	License type: CC BY 4.0

Affiliation: Ghent University, Stanford University

Abstract

High‐dimensional flow cytometry has matured to a level that enables deep phenotyping of cellular systems at a clinical scale. The resulting high‐content data sets allow characterizing the human immune system at unprecedented single cell resolution. However, the results are highly dependent on sample preparation and measurements might drift over time. While various controls exist for assessment and improvement of data quality in a single sample, the challenges of cross‐sample normalization attempts have been limited to aligning marker distributions across subjects. These approaches, inspired by bulk genomics and proteomics assays, ignore the single‐cell nature of the data and risk the removal of biologically relevant signals. This work proposes CytoNorm, a normalization algorithm to ensure internal consistency between clinical samples based on shared controls across various study batches. Data from the shared controls is used to learn the appropriate transformations for each batch (e.g., each analysis day). Importantly, some sources of technical variation are strongly influenced by the amount of protein expressed on specific cell types, requiring several population‐specific transformations to normalize cells from a heterogeneous sample. To address this, our approach first identifies the overall cellular distribution using a clustering step, and calculates subset‐specific transformations on the control samples by computing their quantile distributions and aligning them with splines. These transformations are then applied to all other clinical samples in the batch to remove the batch‐specific variations. We evaluated the algorithm on a customized data set with two shared controls across batches. One control sample was used for calculation of the normalization transformations and the second control was used as a blinded test set and evaluated with Earth Mover's distance. Additional results are provided using two real‐world clinical data sets. Overall, our method compared favorably to standard normalization procedures. The algorithm is implemented in the R package “CytoNorm” and available via the following link: http://www.github.com/saeyslab/CytoNorm © 2019 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry.

Highlights

High-dimensional flow cytometry technologies, such as mass cytometry, are increasingly employed in large clinical studies to better understand the biological mechanisms of diseases [1,2,3]
While some small aliquot-specific differences occurred, the main differences were caused by batch effects: the control and validation samples on the same plate had undergone similar changes in distribution compared to the other plates
The model was trained on the unstimulated control samples, with a FlowSOM grid of 15 by 15 and a final number of 5 clusters

Summary

Introduction

High-dimensional flow cytometry technologies, such as mass cytometry, are increasingly employed in large clinical studies to better understand the biological mechanisms of diseases [1,2,3]. One of the known instrument-dependent issues specific to mass cytometry is signal fluctuation over time, due to changes in instrument performance This signal drift is typically corrected by using polystyrene beads embedded with metals of known concentration [4]. A number of techniques have been proposed to align the distribution of markers across samples [9,10,11] These methods will align the distribution of each of the individual samples without making use of reference controls, which will remove potentially biologically relevant differences in the distribution. Here we demonstrate that technical sources of variation can impact cell types differently, as was described in Reference 10 They provided the option to normalize one marker at the time during the manual gating process, allowing the user to choose for which subpopulation the normalization is applied. To allow a fully automated procedure, our algorithm first uses a clustering algorithm for automated cell type identification prior to normalization

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CytoNorm: A Normalization Algorithm for Cytometry Data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Cytometry Part A

Lead the way for us

Similar Papers

On Efficient Query Processing with the Earth Mover's Distance
Merih Seran Uysal ... Thomas Seidl
-
Merih Seran Uysal, et. al.Merih Seran Uysal ... Thomas Seidl
03 Nov 2014
03 Nov 2014

Local earth mover's distance and face warping [multimedia object distance measure
S.H Srinivasan
-
S.H SrinivasanS.H Srinivasan
30 Jun 2004
30 Jun 2004

Relevance Feedback for the Earth Mover’s Distance
Marc Wichterich ... Christian Beecks
-
Marc Wichterich, et. al.Marc Wichterich ... Christian Beecks
01 Jan 2010
01 Jan 2010

Efficient Filter Approximation Using the Earth Mover's Distance in Very Large Multimedia Databases with Feature Signatures
Merih Seran Uysal ... Christian Beecks
-
Merih Seran Uysal, et. al.Merih Seran Uysal ... Christian Beecks
03 Nov 2014
03 Nov 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CytoNorm: A Normalization Algorithm for Cytometry Data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Cytometry Part A