Lung Nodules Identification in CT Scans Using Multiple Instance Learning*

Wiem Safta,Hichem Frigui

doi:10.1109/icmla55696.2022.00089

Abstract

We propose a Multiple Instance Learning (MIL) approach for lung nodules classification to address the limitations of current Computer-Aided Diagnosis (CAD) systems. One of these limitations consists of the need for a large collection of training samples that require to be segmented and annotated by radiologists. Another consists of using a fixed volume size for all nodules regardless of their actual sizes. Using a MIL approach, we represent each nodule by a nested sequence of volumes centered at the identified center of the nodule. We extract one feature vector from each volume. The set of features for each nodule are combined and represented by a bag. Using this representation, we investigate and compare many MIL algorithms and feature extraction methods. We start by applying benchmark MIL algorithms to traditional Gray Level Co-occurrence Matrix (GLCM) engineered features. Then, we design and train simple Convolutional Neural Networks (CNNs) to learn and extract features that characterize lung nodules. These extracted features are then fed to a benchmark MIL algorithm to learn a classification model. We report the results of three experiments applied to both GLCM and CNN features using two benchmark datasets. We designed our experiments to compare the different features and compare MIL versus Single Instance Learning (SIL) where a single feature vector represents a nodule. We show that our MIL representation using CNN features is more accurate for the lung nodules diagnosis task. We also show that MIL representation achieves better results than SIL applied on the ground truth region of each nodule.

Full Text