Artificial Intelligence (AI)-Predicted Medical Diagnosis in Suspected Mature B-Cell Neoplasms Based on Flow Cytometric Raw Data

Martha-Lena Mueller,Lisa Erdl,Adriane Koppelle,Claudia Haferlach,Manja Meggendorfer,Sven Maschek,Wolfgang Kern,Torsten Haferlach

doi:10.1182/blood-2023-189799

Abstract

Introduction: AI is becoming an integral part of diagnostic workup solutions worldwide. Using our vast database of flow cytometry data of hematologic neoplasms, we develop AI-powered workflows to support high-throughput routine diagnostics. As one of our most frequent requests is proof / exclusion of B-cell lymphoma, we initially focus on mature B-cell neoplasms (B-NHL), aiming at reliably classifying lymphoma cases and distinguish such from normal controls. Methods: An AI model was developed to decipher flow cytometry data in order to classify B-NHL. To that end, uniformly processed samples analyzed by human experts during standardized routine diagnostics were employed (Navios / Cytoflex cytometers, Kaluza software, Beckman Coulter). The model used 3,830 cases in order to distinguish the lymphoma categories diagnosed in routine setting. The model was built with XGBoost using raw cytometric data and expert-informed features. Recall (R, corresponds to sensitivity), precision (P, proportion of truly positive cases) and prediction probability (PP, confidence of the model in its prediction) were recorded. The predicted diagnosis was compared with the human expert diagnosis and this information was used for further model improvement. The process comprised various steps adapted to the flow cytometric data structure. These include essential technicalities such as data transformation and scaling, cluster formation, careful definition of light chain restriction and feature engineering. Results: Focusing on B-NHL (all categories) versus no lymphoma, the model initially resulted in an average R of 88% and P of 87%. The model distinguished categories 1) no lymphoma (R 95%, P 95%), 2) CLL (R 62%, P 71%), 3) HZL (R 96%, P 96%), 4) FL (R 88%, P 100%), 5) LPL (R 78%, P 70%), 6) MCL (R 97%, P 76%), 7) MZL (R 71%, P 71%) and 8) SBLPN (R 100%, P 100%). We found that 364 out of 3830 (9.5%) cases in our cohort could not be processed by the AI model due to technical limitations. To make these accessible for preprocessing and analysis, downstream improve R and P values, and make the same strategy usable for further Flow-AI models, several technical foundations were revisited. Flow cytometry data uses vastly different data ranges considering e.g. scatter vs. fluorescence properties. Since the model needs to be able to identify main populations of interest (such as lymphocytes, granulocytes, monocytes) in a dynamic fashion, data display was modified laying foundation for all further analysis. Our data show that transformation of the raw values followed by scaling to 0 to 1 enables superior clustering results. Normalization is achieved by the RobustScaler from the scikit-learn library. Implementing proper data transformation and scaling, all 3830 cases could now be successfully processed. The next step was to revisit cluster formation options as they are prerequisite for lymphoma identification in the downstream classifier. We found best results with HDBSCAN after optimizing the hyperparameters for the minimum cluster size and noise eradication to adapt automatically based on the number of CD45 positive cells or other intermediate gating results. We could reduce the rate of failure of correct classification from 20% to 17% by HDBSCAN clustering. Essential further steps included the dynamic definition of the light chain separation enabling the model to adapt to each case´s specificities in terms of expression strength of kappa and lambda. This was followed by updating the features in order to make all new information useful. These processes allowed for higher P values for CLL (71% to 80%) and LPL (70% to 74%) as well as higher R values for follicular lymphoma FL (88% to 89%) MZL (71% to 76%). Conclusion: AI is applicable to analyze flow cytometric data allowing prediction of B-NHL. By fine-tuning data transformation, scaling and clustering we massively improved the analysis pipeline gaining 12.5% of cases that had not been accessible for analysis, and in context with proper light chain definition and feature adaptation we were able to improve R and P values of several lymphoma categories. The improvements are a result of explainable feature engineering which enables us to track the weight and actual value of a feature down to the events in the dot plots and expands on our explainable AI (XAI) strategy. Our work sets the groundwork to fully embark on AI-based B-cell lymphoma identification.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Artificial Intelligence (AI)-Predicted Medical Diagnosis in Suspected Mature B-Cell Neoplasms Based on Flow Cytometric Raw Data

Abstract

Talk to us

Similar Papers

More From: Blood

Lead the way for us

Similar Papers

Predictive modeling in reproductive medicine: Where will the future of artificial intelligence research take us?
Carol Lynn Curchoe ... Zev Rosenwaks
Fertility and Sterility | VOL. 114
Carol Lynn Curchoe, et. al.Carol Lynn Curchoe ... Zev Rosenwaks
01 Nov 2020
Fertility and Sterility | VOL. 114

CD11ahighCD8+ T Cell Phenotype Identifies Tumor Specific Effector Cells Whose Frequency and Function Are Enhanced by RT
S.S Park ... K Olivier
International Journal of Radiation Oncology*Biology*Physics | VOL. 93
S.S Park, et. al.S.S Park ... K Olivier
01 Nov 2015
International Journal of Radiation Oncology*Biology*Physics | VOL. 93

Financial Risk Management and Explainable Trustworthy Responsible AI
Sebastian Fritz-Morgenthal ... Jochen Papenbrock
SSRN Electronic Journal | VOL. -
Sebastian Fritz-Morgenthal, et. al.Sebastian Fritz-Morgenthal ... Jochen Papenbrock
01 Jan 2020
SSRN Electronic Journal | VOL. -

An Artificial Neural Network Providing Highly Reliable Decision Support in a Routine Setting for Classification of B-Cell Neoplasms Based on Flow Cytometric Raw Data
Wolfgang Kern ... Torsten Haferlach
Blood | VOL. 134
Wolfgang Kern, et. al.Wolfgang Kern ... Torsten Haferlach
13 Nov 2019
Blood | VOL. 134

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Artificial Intelligence (AI)-Predicted Medical Diagnosis in Suspected Mature B-Cell Neoplasms Based on Flow Cytometric Raw Data

Abstract

Talk to us

Similar Papers

More From: Blood