Machine Learning Based Sampling of X-Ray Images for a Computer-Aided Detection of Tuberculosis

Fernando Guimarães Ferreira,Rodrigo Coura Torres,Jose Seixas,Mayara Bastos,Micael Veríssimo Araújo,Philipp Gaspar,Anete Trajman,Lukas Müller,Carlos Eduardo Covas Costa

doi:10.21528/lnlm-vol20-no2-art7

Abstract

Computer-Aided Detection software relies on annotated data set of X-rays to be developed. The annotation task is time-consuming and requires extensive know-how. This work presents a sampling method to select the most relevant images, which will be annotated for the development of a tuberculosis (TB) screening platform based on machine learning algorithms. The sampling task optimizes the annotation process by reducing the number of images to be analyzed without compromising the diversity and the significance power of the images in the dataset. We developed an algorithm to select images in a dataset to be annotated, based on similarity and dissimilarity measurements of images. Public TB image dataset was utilized to conduct this research. The experiment consisted of a deep learning feature engineering step, followed by topological analysis based on Self-Organizing Map and K-Means. The effectiveness of the process is evaluated at each of its stages: Classification, clustering and the final sampling algorithm which is based on similarity and dissimilarity features.

Full Text