Abstract Introduction: Cytometry by Time-of-Flight (CyTOF) is a single-cell proteomic assay that uses antibody-lanthanide metal conjugates to tag surface or intracellular proteins, quantified via mass spectrometry. Compared to flow cytometry, CyTOF significantly reduces overlapping signals and increases the maximum protein features from 8 to 50. CyTOF data is commonly analyzed with the ‘gating’ workflow, which is the iterative process of plotting cells by two protein features to select discrete cell populations. While CyTOF can comprehensively profile nuanced cell types, gating does not scale well in the number of features. Biases in gating often arise from the forced discretization of cell populations, subjectivity of gate boundaries, and intrinsic sample-sample variation. Contemporary challenges in CyTOF data analysis can be attributed to data volume and gating bias, which can lead to oversight of disease-related cell populations while hindering reproducibility. Methods: We developed a computational platform that enables the processing and analysis of large-scale CyTOF data. Our workflow includes modular components for data harmonization, normalization, transformation, batch correction, quality control, subsampling, and dimensionality reduction via UMAP of input sample files in Flow Cytometry Standard (FCS) format. Further, additional analytical modules are currently being integrated to deploy large-scale machine learning models for cell type identification. We use the results of these models as input for multi-scale analysis, where we aim to link single-cell CyTOF data with clinical outcomes. We test our workflow on previously published datasets and apply it to new leukemia patient cohorts of blood and bone marrow samples and a variety of marker panels. Results: We tested our workflow on previously published datasets of 4.7 M peripheral blood mononuclear cells from healthy donors (N= 20) and T cells (N = 8). We observed that samples mixed well upon integration and separated into major cell types like monocytes, CD4+/CD8+ T-cells, naïve/mature B-cells, and natural killer T-cells. We then applied it to a bone marrow cohort of healthy donors (N = 10), where we observe good mixing of samples but less clear separation of cell types. Finally, we applied our workflow to 9.3 M cells from 51 chronic lymphocytic leukemia (CLL) patients and 4 healthy donors, using a lymphocytic marker panel. Here, we observed a clear resolution of a major clonal B-cell population and immune T-cells. We are in the process of applying it to Acute Myeloid Leukemia (AML) patient data with myeloid and immune marker panels. Conclusions: We developed a computational platform that facilitates the simultaneous analysis of large-scale CyTOF data, both in the number of samples and cells per sample, a variety of marker panels, while minimizing gating biases common in traditional gating analysis. Citation Format: Garth Kong, Tania Vu, Evan Lind, Olga H. Nikolova. Large-scale CyTOF data modeling of leukemia patient cohorts [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 4961.
Read full abstract