In lung cancer, early diagnosis can improve potentially the prognosis. Accurate interpretation of computed tomography (CT) scans demands significant efforts by radiologists due to the extensive number of slices analyzed in each examination, for each patient. Computer-aided diagnosis (CAD) systems have been applied in several medical fields, but mostly in lung nodules detection and classification. CAD systems for lung lesions classification usually extract different types of features from lesions, such as texture feature, shape and intensity. This exploratory study aims to investigate the performance of lung nodules classification in 2D and 3D CT lesions images using Haralick texture features analysis and binary logistic regression. Expert radiologists manually segmented from a CT dataset of 17 benign and 20 malignant nodules, which have their anatomopathological results. Haralick features were extracted from 2D lesions images, using the largest cross-section nodule area, and from all nodule volume (3D). Principal Component Analysis (PCA) was applied to reduce texture features dimensionality, showing two and three principal components (PC) can explain 85.8% and 96.25% of data variance for 2D lesions, and 72.4% and 91.7% for 3D lesions, respectively. Binary logistic regression using leave-one-out cross-validation for training and test datasets showed no differences in accuracy (63% - 68%), using two or three PC. The higher sensitivity (75%) was acquired using 2D images with two or three PC, while the higher specificity (65%) was obtained using 3D images with two or three PC. Binary logistic regression using a small number of Haralick texture features showed better accuracy in lung nodules classification than visual evaluation by radiologists, although the limited dataset. Further studies are needed to generalize and improve these results.
Read full abstract