Background ContextLonger posterior lumbar interbody fusion (PLIF) surgeries for individuals with lumbar spinal stenosis are linked to more complications and negatively affect recovery after the operation. Therefore, there is a critical need for a method to accurately predict patients who are at risk for prolonged operation times. PurposeThis research aimed to develop a clinical model to predict prolonged operation time for patients undergoing PLIF procedures. Study Design/SettingThis study employs a machine-learning approach to analyze data retrospectively collected. Patient Sample3233 patients diagnosed with lumbar spinal stenosis (LSS) had posterior lumbar interbody fusion (PLIF) at 22 hospitals in China from January 2015 to December 2022. Outcome MeasuresThe primary outcome was operation time. Prolonged operation time defined as exceeded 75% of the overall surgical duration, which mean exceeding 240 minutes. MethodsA total of 3233 patients who underwent PLIF surgery with lumbar spinal stenosis (LSS) were divided into one training group and four test groups based on different district areas. The training group included 1569 patients, while Test1 had 541, Test2 had 403, Test3 had 351, and Test4 had 369 patients. Variables consisted of demographics, perioperative details, preoperative laboratory examinations and other Additional factors. Six algorithms were employed for variable screening, and variables identified by more than two screening methods were incorporated into the final model. In the training cohort, a 10-fold cross-validation (CV) and Bayesian hyperparameter optimization techniques were utilized to construct a model using eleven machine learning algorithms. Following this, the model was evaluated using four separate external test sets, and the mean Area Under the Curve (AUC) was computed to determine the best-performing model. Further performance metrics of the best model were evaluated, and SHapley Additive exPlanations(SHAP) were used for interpretability analysis to enhance decision-making transparency. Ultimately, an online calculator was created. ResultsAmong the various machine learning models, the Random Forest achieved the highest performance in the validation set, with AUROC scores of 0.832 in Test1, 0.834 in Test2, 0.816 inTest3, 0.822 in Test4) compared with other machine learning models. The top contributing variables were number of levels fusion, pre-APTT, weight and age. The predictive model was further refined by developing a web-based calculator for clinical application. (https://wenle.shinyapps.io/PPOT_LSS/) ConclusionsThis predictive model can facilitate identification of risk for prolonged operation time following PLIF surgery. Predictive calculators are expected to improve preoperative planning, identify patients with high risk factors, and help clinicians facilitating the improvement of treatment plans and the implementation of clinical intervention.