Background: Identification of minimal residual disease (MRD) in acute myeloid leukemia (AML) is essential for assessing risk of relapse and guiding therapeutic management. While molecular assays have streamlined AML MRD detection, flow cytometric immunophenotyping (FCI) remains the only available test modality for approximately 50% of patients lacking traceable molecular MRD targets. Unfortunately, due to the high level of expertise required, long manual analysis time and complexity of AML MRD FCI data, only a few reference laboratories offer this as an orderable test. Methods: To facilitate the analysis of AML MRD by FCI, we devised an unsupervised machine learning (ML) pipeline that intakes raw FCI data, eliminates acquisition errors (FlowCut), performs state-of-the-art clustering (PARC) and dimensionality reduction (UMAP), estimates how aberrant or different-from-normal each cluster is compared to a cohort of controls (15 donor and 15 reactive marrow aspirates), downsamples clusters with high event counts without significant loss of low-event clusters, and corrects for downsampling using an upsampling factor. This CCADDAS (Clinically-intended Clustering of All events, Dimensionality reduction, Downsampling and Aberrancy Scaling) pipeline was containerized, deployed in a cloud environment (Google Cloud Vertex AI), and used to process raw FCI files from 50 AML MRD FCI assays (3-tube/10-color panel) with 0.1% to <5% leukemic subsets, and 20 assays with no detectable leukemic subsets by prior conventional analysis. Machine learning-enhanced data files were analyzed using a commercially available FCI analysis software (Kaluza v 2.1, Beckman Coulter). Results: Cluster-informed downsampling reduced the total number of events needed to be analyzed from 1 million cells to 179,531 cells per tube on average (82% data reduction), while preserving low event subsets (clusters smaller than 5000 cells) and allowing for accurate quantitative estimates for any gated population [# events x mean (upsampling factor)]. MRD was detected in all positive cases, and absent on all negative cases when analyzed using the enhanced downsampled data (100% correlation with original data), with accurate quantitative estimates (Spearman r = 0.85). Analysis time was dramatically reduced from 73 minutes to 8 minutes per case on average (89% reduction of manual analysis time). Dimensionality reduction parameters simplified gating of lineage- and maturation-defined subsets. Cluster comparison against control samples facilitated the rapid identification of MRD subsets, based on a calculated aberrancy scale parameter that helped distinguish MRD blasts (mean ± 2SD: 0.85 ± 0.3) from benign blasts (mean ± 2SD: 0.26 ± 0.3) (p<0.0001). Conclusion: We introduce CCADDAS, an unsupervised containerized cloud ML pipeline that enhances raw AML-MRD FCI data with ML annotations, cluster-informed downsampling and comparison to normal controls to create markedly smaller and software-agnostic FCI files. CCADDAS simplifies and accelerates detection of AML-MRD in clinical diagnostics, reducing number of cells analyzed by 82% and manual analysis time by 89%, without impacting test performance. Moreover, CCADDAS can be deployed on any laboratory-developed AML-MRD assay, and its small-sized export is compatible with any clinical FCI software, computing platform and analysis strategy. Adoption of CCADDAS is likely to facilitate the implementation of AML MRD FCI analysis by more clinical laboratories.
Read full abstract