Real-World Demonstration of AI to Clinical Cytometry for Rapid and Reliable Decision Support and Automated Reporting of Blood Cancers Including B-NHL and Acute Leukemias

Karsten Ca Miermans,Jurgen Alois Riedl,Holger Hauspurg,Felix Kunzweiler,Franz Elsner,Florian Pfisterer,Tim Adler,Hannes Lüling,Aleksandra Mezydlo,Felix Spöttel,Artur Toloknieiev,João Moura Alves

doi:10.1182/blood-2023-182082

Abstract

Introduction: Flow cytometry is an integral part of routine diagnostics for hematological malignancies. Unfortunately, a decreasing number of skilled operators have to cope with increasing case volumes. The current methods are also known to produce inter- and intra-observer variability. Aim: To reduce labor time, dependency on expert knowledge and interpretation variability, we aimed to build and validate clinical-grade decision support software in the routine using machine learning (ML) for clinical cytometry. Methods: We obtained ≈100k flow cytometry cases from multiple centers, taken from clinical routine and annotated by expert users using multiple diagnostic methods. These cases span the full spectrum of hematological malignancy, including 47% non-malignant reference samples. As the data were not measured using harmonized protocols, we developed data pooling techniques to merge the data, including synthetic imputation of missing markers. After data processing, we applied multi-layer artificial neural networks to predict sub-type or non-malignancy (B-NHL and acute leukemias, ten classes at time of writing), directly from the data without any human preprocessing. Transfer learning was then used to fine-tune the model to a particular protocol for which the model could be used in routine. Besides sub-typing recommendations, users need to report cell population frequencies, traditionally measured using manual ‘gating’. We trained separate supervised ML models to classify single cells based on a combination of curated and refined routine gating data, expanded with in-house annotations produced by a highly trained cytometrist. To enable clinicians to receive the diagnostic recommendations and produce reports for the treating doctor, we built a CE-IVD web-application (“hema.to”) in which selected users could upload cytometry data, inspect the raw data including abnormalities in the immunophenotype, and audit the ML recommendations against the WHO criteria. The user feedback from each case was recorded, from which the consistency of the ML recommendations with expert opinion could be compared. Before the product was integrated in the clinical routine, a ‘dry run’ retrospective two-arm, four-center blinded clinical study using 96 randomly selected non-overlapping B-NHL cases revealed that analysis time was reduced by &gt;2x and accuracy saw a slight (non-significant) increase against a gold standard as compared to the control arm with a traditional workflow. Results: A worst-case sub-typing performance was computed by testing the model against historic data without expert supervision, and found an f1 score of &gt;90% and sensitivity of 96%. The ML models for acute leukemias specifically show an f1 score of ≈90%, a false positive rate of ≈1.5% and false negative rate of 2-6%. These quality metrics demonstrate that the system can be used for both screening and sub-typing. We therefore integrated hema.to into the routine for B-NHL diagnosis at Result Laboratorium (Dordrecht) and the HpH (Hamburg) since January, for the latter including a deep integration into the laboratory database with automated reporting of expressed CD markers. To date, over a thousand routine cases have been analyzed using the system. We then measured the relationship between predicted sub-typing confidence of the model and agreement with the final expert judgment. The data showed that, on average, confidence≈accuracy ( R2=0.94), indicating that the model is neither over- nor underconfident. As such, the model is a “trustworthy” classifier. Expert agreement over time revealed that the quality of the diagnostic recommendations has been stable over time despite a change of device type and was consistently in excess of 90% across various metrics, including a 100% specificity during the four most recent recorded months. Conclusions: We've shown that machine learning can offer decision support and automated reporting for screening and classifying blood cancers from flow cytometry in a real-world routine setting. Our next focus is to expand the number of indications and increase the model sensitivity to small pathological populations, including measurable residual disease. These results represent major strides towards decision support software for any lab and all hematological neoplasms. Such a software will not only speed up and simplify diagnostic workflows, but also improve the quality of the analysis.

Full Text