BACKGROUND CONTEXTDegenerative lumbar spondylolisthesis (DLS) is a prevalent spinal disorder, often requiring surgical intervention. Accurately predicting surgical outcomes is crucial to guide clinical decision-making, but this is challenging due to the multifactorial nature of postoperative results. Traditional risk assessment tools have limitations, and with the advent of machine learning, there is potential to enhance the precision and comprehensiveness of preoperative evaluations. PURPOSEWe aimed to develop a machine-learning algorithm to predict surgical outcomes in patients with degenerative lumbar spondylolisthesis (DLS) undergoing spinal fusion surgery, only using preoperative data. STUDY DESIGNRetrospective cross-sectional study. PATIENT SAMPLEPatients with DLS undergoing lumbar spinal fusion surgery. OUTCOME MEASURESThis study aimed to predict the occurrence of lower back pain (LBP) ≥4 on the numeric analogue scale (NAS) 2 years after surgery. LBP was evaluated as the average pain patients experienced at rest in the week before questioning. NAS ranges from 0 to 10, 0 representing no pain and 10 representing the worst pain imaginable. METHODSWe conducted a retrospective analysis of prospectively enrolled patients who underwent spinal fusion surgery for degenerative lumbar spondylolistheses at our institution in the United States between January 2016 and December 2018. The initial patient characteristics to be included in the training of the model were chosen by clinical expertise and through a literature review and included demographic characteristics, comorbidities, and radiologic features. The data was split into a training and validation datasets using a 60/40 split. Four different machine learning models were trained, including the modern XGBoost model, logistic regression, random-forest, and support vector machine (SVM). The models were evaluated according to the area under the curve (AUC) of the receiver operating characteristics (ROC) curve. An AUC of 0.7 to 0.8 was considered fair, 0.8 to 0.9 good, and ≥ 0.9 excellent. Additionally, a calibration plot and the Brier score were calculated for each model. RESULTSA total of 135 patients (66% female) were included. A total of 38 (28%) patients reported LBP ≥ 4 after 2 years, representing the positive class. The XGBoost model demonstrated the best performance in the validation set with an AUC of 0.81 (95% CI 0.67–0.95). The other machine learning models performed significantly worse: with an AUC of 0.52 (95% CI 0.37–0.68) for the SVM, 0.56 (95% CI 0.37–0.76) for the logistic regression and an AUC of 0.56 (95% CI 0.37–0.78) for the random forest. In the XGBoost model age, composition of the erector spinae, and severity of lumbar spinal stenosis as were identified as the most important features. CONCLUSIONSThis study represents a novel approach to predicting surgical outcomes in spinal fusion patients. The XGBoost demonstrated a better performance compared with classical models and highlighted the potential contributions of age and paraspinal musculature atrophy as significant factors. These findings have important implications for enhancing patient care through the identification of high-risk individuals and modifiable risk factors. As the incorporation of machine learning algorithms into clinical decision-making continues to gain traction in research and clinical practice, our insights reinforce this trajectory by showcasing the potential of these techniques in forecasting surgical results.