Constructing a Deep Learning Radiomics Model Based on X-ray Images and Clinical Data for Predicting and Distinguishing Acute and Chronic Osteoporotic Vertebral Fractures: A Multicenter Study

Guangyu Tang,Xueli Zhang,Zhipeng Liang,Liang Xia,Weixiao Zhang,Jianguo Xia,Jun Zhang,Lin Zhang,Yongkang Liu,Jiayi Liu,Jun Tang

doi:10.1016/j.acra.2023.10.061

Abstract

To construct and validate a deep learning radiomics (DLR) model based on X-ray images for predicting and distinguishing acute and chronic osteoporotic vertebral fractures (OVFs). A total of 942 cases (1076 vertebral bodies) with both vertebral X-ray examination and MRI scans were included in this study from three hospitals. They were divided into a training cohort (n=712), an internal validation cohort (n=178), an external validation cohort (n=111), and a prospective validation cohort (n=75). The ResNet-50 model architecture was used for deep transfer learning (DTL), with pre-training performed on RadImageNet and ImageNet datasets. DTL features and radiomics features were extracted from lateral X-ray images of OVFs patients and fused together. A logistic regression model with the least absolute shrinkage and selection operator was established, with MRI showing bone marrow edema as the gold standard for acute OVFs. The performance of the model was evaluated using receiver operating characteristic curves. Eight machine learning classification models were evaluated for their ability to distinguish between acute and chronic OVFs. The Nomogram was constructed by combining clinical baseline data to achieve visualized classification assessment. The predictive performance of the best RadImageNet model and ImageNet model was compared using the Delong test. The clinical value of the Nomogram was evaluated using decision curve analysis (DCA). Pre-training resulted in 34 and 39 fused features after feature selection and fusion. The most effective machine learning algorithm in both DLR models was Light Gradient Boosting Machine. Using the Delong test, the area under the curve (AUC) for distinguishing between acute and chronic OVFs in the training cohort was 0.979 and 0.972 for the RadImageNet and ImageNet models, respectively, with no statistically significant difference between them (P=0.235). In the internal validation cohort, external validation cohort, and prospective validation cohort, the AUCs for the two models were 0.967 vs 0.629, 0.886 vs 0.817, and 0.933 vs 0.661, respectively, with statistically significant differences in all comparisons (P<0.05). The deep learning radiomics nomogram (DLRN) was constructed by combining the predictive model of RadImageNet with clinical baseline features, resulting in AUCs of 0.981, 0.974, 0.895, and 0.902 in the training cohort, internal validation cohort, external validation cohort, and prospective validation cohort, respectively. Using the Delong test, the AUCs for the fused feature model and the DLRN in the training cohort were 0.979 and 0.981, respectively, with no statistically significant difference between them (P=0.169). In the internal validation cohort, external validation cohort, and prospective validation cohort, the AUCs for the two models were 0.967 vs 0.974, 0.886 vs 0.895, and 0.933 vs 0.902, respectively, with statistically significant differences in all comparisons (P<0.05). The Nomogram showed a slight improvement in predictive performance in the internal and external validation cohort, but a slight decrease in the prospective validation cohort (0.933 vs 0.902). DCA showed that the Nomogram provided more benefits to patients compared to the DLR models. Compared to the ImageNet model, the RadImageNet model has higher diagnostic value in distinguishing between acute and chronic OVFs. Furthermore, the diagnostic performance of the model is further improved when combined with clinical baseline features to construct the Nomogram.

Full Text