Abstract Multiple myeloma, the second most common hematologic malignancy, is characterized by recurrent molecular genetic variants which portend important prognostic and therapeutic implications. These variants, which include chromosomal aneuploidies, translocations, and other structural changes, are currently identified by performing karyotyping and fluorescence in situ hybridization analysis on bone marrow biopsy samples. Comprehensive variant testing for full characterization of multiple myeloma cases can take days to weeks, is costly, and is not available to all patients. Past studies have explored to what extent cellular morphologic information from bone marrow aspirates or immunophenotypic information from flow cytometry can predict these genetic variants via the use of machine learning algorithms. However, the integration of multiple hematopathology data modalities using machine learning towards improving the diagnosis and characterization of hematologic neoplasms has yet to be explored. To this end, we developed a deep learning-based algorithm which, for individual patients, integrates tens-of-thousands of cell images from Wright-stained bone marrow aspirate smears with hundreds-of-thousands of events from myeloma flow cytometry panels to predict the presence or absence of particular molecular genetic variants among 85 cases of multiple myeloma. Attention-based multi-instance learning models were developed utilizing convolutional neural networks to analyze cell image data and multi-layer perceptrons to analyze flow cytometry data. Models trained solely on cell image data showed variable performance in predicting genetic subtypes of multiple myeloma cases: t(11;14), AUROC = 0.795; 1q+, AUROC = 0.743; del(17p), AUROC = 0.520. Similar models trained solely on flow cytometry data showed fair performance for all subtypes: t(11;14), AUROC = 0.761; 1q+, AUROC = 0.761; del(17p), AUROC = 0.787. However, models which combine features among high-attention events from both cell image and flow cytometry data yield significantly improved performance: t(11;14), AUROC = 0.892; 1q+, AUROC = 0.789; del(17p), AUROC = 0.876. Similarly, models trained to predict established prognostic classifications of multiple myeloma cases performed best when cell image and flow cytometry data is integrated: cell images alone, AUROC = 0.721; flow cytometry alone, AUROC = 0.827; both data combined, AUROC = 0.839. Additional improvements in prognostic classification were obtained when utilizing a Kronecker product to more completely utilize pairwise interactions between cell image and flow cytometry features: AUROC = 0.864. This study is, to our knowledge, the first to utilize deep learning to integrate multiple hematopathology data modalities towards automated diagnosis or subclassification of hematologic neoplasms in individual patients. The results of this study demonstrate the feasibility of accurately predicting molecular genetic subtypes of multiple myeloma cases solely from cell morphology and flow cytometric information. With further refinement, these promising models could be applied in low resource settings as well as research settings to better understand the biological relationship between plasma cell genetics with morphology and immunophenotype.
Read full abstract