Enhancing classification of cells procured from bone marrow aspirate smears using generative adversarial networks and sequential convolutional neural network

Woo Jin Kim,Debapriya Hazra,Yung-Cheol Byun

doi:10.1016/j.cmpb.2022.107019

Abstract

Background and Objective: Leukemia represents 30% of all pediatric cancers and is considered the most common malignancy affecting adults and children. Cell differential count obtained from bone marrow aspirate smears is crucial for diagnosing hematologic diseases. Classification of these cell types is an essential task towards analyzing the disease, but it is time-consuming and requires intensive manual intervention. While machine learning has shown excellent outcomes in automating medical diagnosis, it needs ample data to build an efficient model for real-world tasks. This paper aims to generate synthetic data to enhance the classification accuracy of cells obtained from bone marrow aspirate smears. Methods: A three-stage architecture has been proposed. We first collaborate with experts from the medical domain to prepare a dataset that consolidates microscopic cell images obtained from bone marrow aspirate smears from three different sources. The second stage involves a generative adversarial networks (GAN) model to generate synthetic microscopic cell images. We propose a GAN model consisting of three networks; generator discriminator and classifier. We train the GAN model with the loss function of Wasserstein GAN with gradient penalty (WGAN-GP). Since our GAN has an additional classifier and was trained using WGAN-GP, we named our model C-WGAN-GP. In the third stage, we propose a sequential convolutional neural network (CNN) to classify cells in the original and synthetic dataset to demonstrate how generating synthetic data and utilizing a simple sequential CNN model can enhance the accuracy of cell classification. Results: We validated the proposed C-WGAN-GP and sequential CNN model with various evaluation metrics and achieved a classification accuracy of 96.98% using the synthetic dataset. We have presented each cell type’s accuracy, specificity, and sensitivity results. The sequential CNN model achieves the highest accuracy for neutrophils with an accuracy rate of 97.5%. The highest value for sensitivity and specificity are 97.1% and 97%. Our proposed GAN model achieved an inception score of 14.52 ± 0.10, significantly better than the existing GAN models. Conclusions: Using three network GAN architecture produced more realistic synthetic data than existing models. Sequential CNN model with the synthetic data achieved higher classification accuracy than the original data.

Full Text