COVID-19 chest X-ray image classification in the presence of noisy labels

Xiaoqing Ying,Hao Liu,Rong Huang

doi:10.1016/j.displa.2023.102370

Abstract

The Corona Virus Disease 2019 (COVID-19) has been declared a worldwide pandemic, and a key method for diagnosing COVID-19 is chest X-ray imaging. The application of convolutional neural network with medical imaging helps to diagnose the disease accurately, where the label quality plays an important role in the classification problem of COVID-19 chest X-rays. However, most of the existing classification methods ignore the problem that the labels are hardly completely true and effective, and noisy labels lead to a significant degradation in the performance of image classification frameworks. In addition, due to the wide distribution of lesions and the large number of local features of COVID-19 chest X-ray images, existing label recovery algorithms have to face the bottleneck problem of the difficult reuse of noisy samples. Therefore, this paper introduces a general classification framework for COVID-19 chest X-ray images with noisy labels and proposes a noisy label recovery algorithm based on subset label iterative propagation and replacement (SLIPR). Specifically, the proposed algorithm first obtains random subsets of the samples multiple times. Then, it integrates several techniques such as principal component analysis, low-rank representation, neighborhood graph regularization, and k-nearest neighbor for feature extraction and image classification. Finally, multi-level weight distribution and replacement are performed on the labels to cleanse the noise. In addition, for the label-recovered dataset, high confidence samples are further selected as the training set to improve the stability and accuracy of the classification framework without affecting its inherent performance. In this paper, three typical datasets are chosen to conduct extensive experiments and comparisons of existing algorithms under different metrics. Experimental results on three publicly available COVID-19 chest X-ray image datasets show that the proposed algorithm can effectively recover noisy labels and improve the accuracy of the image classification framework by 18.9% on the Tawsifur dataset, 19.92% on the Skytells dataset, and 16.72% on the CXRs dataset. Compared to the state-of-the-art algorithms, the gain of classification accuracy of SLIPR on the three datasets can reach 8.67%-19.38%, and the proposed algorithm also has certain scalability while ensuring data integrity.

Full Text