Abstract
The increasing use of mass cytometry for analyzing clinical samples offers the possibility to perform comparative analyses across public datasets. However, challenges in batch normalization and data integration limit the comparison of datasets not intended to be analyzed together. Here, we present a data integration strategy, CytofIn, using generalized anchors to integrate mass cytometry datasets from the public domain. We show that low-variance controls, such as healthy samples and stable channels, are inherently homogeneous, robust against stimulation, and can serve as generalized anchors for batch correction. Single-cell quantification comparing mass cytometry data from 989 leukemia files pre- and post normalization with CytofIn demonstrates effective batch correction while recapitulating the gold-standard bead normalization. CytofIn integration of public cancer datasets enabled the comparison of immune features across histologies and treatments. We demonstrate the ability to integrate public datasets without necessitating identical control samples or bead standards for fast and robust analysis using CytofIn.
Highlights
4, Jeffrey Waters[5], Bita Sahaf[5], The increasing use of mass cytometry for analyzing clinical samples offers the possibility to perform comparative analyses across public datasets
Batch normalization across mass cytometry datasets remains a major bottleneck for performing large-scale data integration from public databases
Current approaches for mass cytometry data normalization across datasets often demand the use of identical replicates or bead standards and whose absence can hamper cross-dataset comparison
Summary
4, Jeffrey Waters[5], Bita Sahaf[5], The increasing use of mass cytometry for analyzing clinical samples offers the possibility to perform comparative analyses across public datasets. Single-cell quantification comparing mass cytometry data from 989 leukemia files pre- and post normalization with CytofIn demonstrates effective batch correction while recapitulating the goldstandard bead normalization. We demonstrate the ability to integrate public datasets without necessitating identical control samples or bead standards for fast and robust analysis using CytofIn. Mass cytometry (cytometry time of flight or CyTOF) is an increasingly widespread technique for the discovery and monitoring of cell populations using single-cell, highparameter protein measurements[1]. Batch effects remain a major limiting factor when comparing mass cytometry datasets In this case, biological signals can be confounded by technical noise that is irrelevant to biological sources, making data interpretation and inference challenging. Methods like CyTofRUV and CytoNorm include identical technical replicates, (aliquots of the same sample) in each batch to correct data distributions of protein signals based on a goal distribution[18–20]. A method to enable cross-dataset comparison without identical technical replicates is needed (Supplementary Table 1)[19]
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.