Explainable End-to-End Supervised Learning Identifies Myelodysplastic Neoplasms in Bone Marrow Smears

Jan-Niklas Eckardt,Tim Schmittmann,Freya Schulze,Sebastian Riechert,Martin M K Schneider,Lukas Reichel,Miriam Eva Helena Gediga,Katja Sockel,Anas Shekh Sulaiman,Ekaterina Balaian,Christoph Röllig,Frank P Kroschinsky,Anne Marie Asemissen,Uwe Platzbecker,Christian Pohlkamp,Torsten Haferlach,Karsten Wendt,Martin Bornhäuser,Jan Moritz Middeke

doi:10.1182/blood-2023-185628

Abstract

Myelodysplastic neoplasms (MDS) encompass clonal myeloid malignancies that are characterized by ineffective hematopoiesis, cytopenia, myelodysplasia, and recurrent genetic events. Apart from genetic abnormalities and the assessment of peripheral blood for cytopenia, cytomorphologic evaluation of bone marrow morphology is key in initial diagnosis and response assessment as well as to detect disease transformation to acute myeloid leukemia (AML). However, the identification of aberrant morphologies in bone marrow cells is challenging and prone to inter-observer variability even for seasoned morphologists resulting in discrepancies between local site and central review. After the introduction of deep learning (DL) models for computer vision tasks, a wide variety of applications in imaging diagnostics has been addressed using convolutional neural networks including the assessment of bone marrow cell types and prediction of hematological malignancies. Bone marrow smear (BMS) images of manually selected regions of interest (ROI) from 483 MDS patients treated at the University Hospital Dresden were captured at 40-fold magnification with a resolution of 2560*1920 pixels and pre-processed. In parallel, BMS images from two control cohorts were taken: (i) 1226 AML patients registered in the German Study Alliance Leukemia (SAL) registry, and (ii) 236 healthy bone marrow donors. In addition, an external validation cohort of 50 MDS patients was obtained from the Munich Leukemia Laboratory (MLL). Image-level labels were provided denoting the corresponding diagnosis: “MDS”, “AML” and “healthy donor”. To balance different sample sizes between the datasets, simple image augmentation techniques such as random sized cropping, color shifting, and linear transformations were used. Training and testing were performed with 5-fold cross-validation, i.e. multiple train-test-splits of 80:20. Importantly, cell-level labels, e.g. specifically annotated cell types or features like distinct dysplastic morphologies were not used to train our model as we intended to abstract visual information exclusively through our neural networks instead of using expert-guided feature engineering. Six different recently introduced convolutional neural network (CNN) architectures were compared (ResNet-34/50/101/152, ResNeXt-50/101, ShuffleNet V2, DenseNet-121/169/201, Wide ResNet-50/101, and SqueezeNet 1.1). To evaluate individual model performance, we used accuracy, recall (=sensitivity), precision (=positive predictive value), and area-under-the-curve (AUC) of both the receiver-operating-characteristic (ROC) and precision-recall-curve (PRC). Binary classification tasks showed high accuracy of 0.956 and 0.981 for distinguishing MDS from healthy donors and AML, respectively (Table I). Results were validated in the external MDS cohort reaching an accuracy of 0.916 for MDS vs. donors and 0.968 for MDS vs. AML (Table I). For explainability, we used occlusion sensitivity mapping which iteratively blocks image areas from being evaluated by the CNNs, measures consecutive drops in prediction performance, and thus identifies image areas of high importance (highlighted in green). By reviewing images, we found the CNNs to target nuclei not only of dysplastic cells but rather of all available cells in a ROI (Figure 1). Using supervised end-to-end deep learning, we developed a software framework to distinguish between MDS, AML, and healthy controls with very high accuracy. Importantly, models do not require laborious feature engineering, i.e. manually drafted, cell-level labels alleviating the need for time-consuming and cost-ineffective manual labeling and thereby bypassing a key bottleneck for current standards of computer vision in microscopy.

Full Text