The study aims to systematically characterize the effect of CT parameter variations on images and lung radiomic and deep features, and to evaluate the ability of different image harmonization methods to mitigate the observed variations. A retrospective in-house sinogram dataset of 100 low-dose chest CT scans was reconstructed by varying radiation dose (100%, 25%, 10%) and reconstruction kernels (smooth, medium, sharp). A set of image processing, convolutional neural network (CNNs), and generative adversarial network-based (GANs) methods were trained to harmonize all image conditions to a reference condition (100% dose, medium kernel). Harmonized scans were evaluated for image similarity using peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and learned perceptual image patch similarity (LPIPS), and for the reproducibility of radiomic and deep features using concordance correlation coefficient (CCC). CNNs consistently yielded higher image similarity metrics amongst others; for Sharp/10%, which exhibited the poorest visual similarity, PSNR increased from a mean ± CI of 17.763 ± 0.492 to 31.925 ± 0.571, SSIM from 0.219 ± 0.009 to 0.754 ± 0.017, and LPIPS decreased from 0.490 ± 0.005 to 0.275 ± 0.016. Texture-based radiomic features exhibited a greater degree of variability across conditions, i.e. a CCC of 0.500 ± 0.332, compared to intensity-based features (0.972 ± 0.045). GANs achieved the highest CCC (0.969 ± 0.009 for radiomic and 0.841 ± 0.070 for deep features) amongst others. Convolutional neural networks are suitable if downstream applications necessitate visual interpretation of images, whereas generative adversarial networks are better alternatives for generating reproducible quantitative image features needed for machine learning applications. Understanding the efficacy of harmonization in addressing multi-parameter variability is crucial for optimizing diagnostic accuracy and a critical step toward building generalizable models suitable for clinical use.
Read full abstract