Semi-supervised labelling of chest x-ray images using unsupervised clustering for ground-truth generation

Victor Ikechukwu Agughasi,Murali Srinivasiah

doi:10.31763/aet.v2i3.1143

Victor Ikechukwu Agughasi, Murali Srinivasiah

Open Access

https://doi.org/10.31763/aet.v2i3.1143

Copy DOI

Journal: Applied Engineering and Technology	Publication Date: Sep 12, 2023
Citations: 3	License type: CC BY-SA 4.0

Affiliation: Maharaja Engineering College

Abstract

Supervised classifiers require a lot of data with accurate labels to learn to recognize chest X-ray images (CXR). However, manually labeling an extensive collection of CXR images is time-consuming and costly. To address this issue, a method for the semi-supervised labelling of extensive collections of CXR images is proposed leveraging unsupervised clustering with minimum expert knowledge to generate ground truth images. The proposed methodology entails: using unsupervised clustering techniques such as K-Means and Self-Organizing Maps. Second, the images are fed to five different feature vectors to utilize the potential differences between features to their full advantage. Third, each data point gets the label of the cluster’s center to which it belongs. Finally, a majority vote is used to decide the ground truth image. The number of clusters created by the method chosen strictly limits the amount of human involvement. To evaluate the effectiveness of the proposed method, experiments were conducted on two publicly available CXR datasets, namely VinDR-CXR and Montgomery datasets. The experiments showed that, for a KNN classifier, manually labeling only 1% (VinDr-CXR), or 10% (Montgomery) of the training data, gives a similar performance as labeling the whole dataset. The proposed methodology efficiently generates ground-truth images from publicly available CXR datasets. To our knowledge, this is the first study to use the VinDr-CXR and Montgomery datasets for ground truth image generation. Extensive experimental analysis using machine learning and statistical techniques shows that the proposed methodology efficiently generates ground truth images from CXR datasets.

Full Text