ObjectiveTo develop and externally validate a binary classification model for lumbar vertebral body fractures based on CT images using deep learning methods. MethodsThis study involved data collection from two hospitals for AI model training and external validation. In Cohort A from Hospital 1, CT images from 248 patients, comprising 1508 vertebrae, revealed that 20.9% had fractures (315 vertebrae) and 79.1% were non-fractured (1193 vertebrae). In Cohort B from Hospital 2, CT images from 148 patients, comprising 887 vertebrae, indicated that 14.8% had fractures (131 vertebrae) and 85.2% were non-fractured (756 vertebrae). The AI model for lumbar spine fractures underwent two stages: vertebral body segmentation and fracture classification. The first stage utilized a 3D V-Net convolutional deep neural network, which produced a 3D segmentation map. From this map, region of each vertebra body were extracted and then input into the second stage of the algorithm. The second stage employed a 3D ResNet convolutional deep neural network to classify each proposed region as positive (fractured) or negative (not fractured). ResultsThe AI model’s accuracy for detecting vertebral fractures in Cohort A’s training set (n = 1199), validation set (n = 157), and test set (n = 152) was 100.0 %, 96.2 %, and 97.4 %, respectively. For Cohort B (n = 148), the accuracy was 96.3 %. The area under the receiver operating characteristic curve (AUC-ROC) values for the training, validation, and test sets of Cohort A, as well as Cohort B, and their 95 % confidence intervals (CIs) were as follows: 1.000 (1.000, 1.000), 0.978 (0.944, 1.000), 0.986 (0.969, 1.000), and 0.981 (0.970, 0.992). The area under the precision-recall curve (AUC-PR) values were 1.000 (0.996, 1.000), 0.964 (0.927, 0.985), 0.907 (0.924, 0.984), and 0.890 (0.846, 0.971), respectively. According to the DeLong test, there was no significant difference in the AUC-ROC values between the test set of Cohort A and Cohort B, both for the overall data and for each specific vertebral location (all P>0.05). ConclusionThe developed model demonstrates promising diagnostic accuracy and applicability for detecting lumbar vertebral fractures.