Abstract Current flow cytometric analysis of blood and bone marrow samples for the diagnosis of acute leukemias relies heavily on manual intervention in both the processing and analysis steps, introducing significant subjectivity into the resulting diagnosis and increasing diagnostic turn-around time. Additionally, concurrent molecular characterization of these samples via cytogenetics and targeted sequencing panels can take multiple days, thereby delaying patient diagnosis and treatment. Attention-based multi-instance learning models are machine learning models that can make accurate predictions and generate interpretable insights regarding the classification of a sample from multiple events/cells; while these models have been developed for anatomic pathology applications, they have yet to be applied to flow cytometry data. By utilizing 1,820 flow cytometry samples from 2019-2022 at Brigham and Women’s Hospital, we developed attention-based multi-instance machine learning models for automated diagnosis of acute leukemia, including differentiation of acute myeloid leukemia (AML) from B-lymphoblastic leukemia/lymphoma (B-ALL). Additionally, using concurrent cytogenetic and targeted sequencing data from 674 acute leukemia samples, machine learning models for prediction of molecular aberrancies from flow cytometry data were developed. Machine learning models were created using the TabNet deep learning architecture and VIME self-supervised training algorithm, which are state-of-the-art approaches towards machine learning from tabular data including flow cytometry data. Attention-based multi-instance models demonstrated strong performance for the automated diagnosis of acute leukemia versus non-leukemia samples (AUROC 0.869), as well as the separation of AML from B-ALL samples (AUROC 0.971). These models also accurately predicted cytogenetic aberrancies among AML samples, including t(15;17);PML::RARA (AUROC 1.00), as well as mutations including NPM1 (AUROC 0.725). These models do not require any manual intervention including compensation or gating, and additionally provide quantitative scores for the relative importance of different flow cytometry events and markers for the diagnosis of a particular sample. These importance scores can be integrated into flow cytometry analysis software for visualization and interpretation by hematopathologists. In this study, we have demonstrated the capability of machine learning models to provide automated diagnoses of acute leukemia, as well as accurately predict cytogenetic and molecular aberrancies in blood and bone marrow samples using flow cytometry data. This automated workflow can significantly decrease diagnostic turn-around time and ultimately improve patient outcomes.
Read full abstract