Repeatability and reproducibility of deep learning features for lung adenocarcinoma subtypes with nodules less than 10 mm in size: a multicenter thin-slice computed tomography phantom and clinical validation study.

Yi Zhan,Fangyun Li,Fei Shan,Renxiang Dai,Lingxiao Zhou,Zenghui Cheng,Yaoyao Zhuo

doi:10.21037/qims-24-77

Abstract

Deep learning features (DLFs) derived from radiomics features (RFs) fused with deep learning have shown potential in enhancing diagnostic capability. However, the limited repeatability and reproducibility of DLFs across multiple centers represents a challenge in the clinically validation of these features. This study thus aimed to evaluate the repeatability and reproducibility of DLFs and their potential efficiency in differentiating subtypes of lung adenocarcinoma less than 10 mm in size and manifesting as ground-glass nodules (GGNs). A chest phantom with nodules was scanned repeatedly using different thin-slice computed tomography (TSCT) scanners with varying acquisition and reconstruction parameters. The robustness of the DLFs was measured using the concordance correlation coefficient (CCC) and intraclass correlation coefficient (ICC). A deep learning approach was used for visualizing the DLFs. To assess the clinical effectiveness and generalizability of the stable and informative DLFs, three hospitals were used to source 275 patients, in whom 405 nodules were pathologically differentially diagnosed as GGN lung adenocarcinoma less than 10 mm in size and were retrospectively reviewed for clinical validation. A total of 64 DLFs were analyzed, which revealed that the variables of slice thickness and slice interval (ICC, 0.79±0.18) and reconstruction kernel (ICC, 0.82±0.07) were significantly associated with the robustness of DLFs. Feature visualization showed that the DLFs were mainly focused around the nodule areas. In the external validation, a subset of 28 robust DLFs identified as stable under all sources of variability achieved the highest area under curve [AUC =0.65, 95% confidence interval (CI): 0.53-0.76] compared to other DLF models and the radiomics model. Although different manufacturers and scanning schemes affect the reproducibility of DLFs, certain DLFs demonstrated excellent stability and effectively improved diagnostic the efficacy for identifying subtypes of lung adenocarcinoma. Therefore, as the first step, screening stable DLFs in multicenter DLFs research may improve diagnostic efficacy and promote the application of these features.

Full Text