Objectives: The multicenter anterior cruciate ligament reconstruction study (MARS) Group has implicated several factors that influence revision anterior cruciate ligament (rACLR) outcomes. As machine learning (ML) becomes increasingly utilized in the orthopaedic literature, the application of ML methodology to MARS cohort data presents a valuable opportunity to translate data into patient-specific insights. This study sought to apply novel ML methodology to the MARS cohort data in order to determine 1) an optimal predictive model of rACLR graft failure and 2) features that are most predictive of rACLR graft failure in context of the predictive model. Methods: The MARS Group of 83 surgeons and 52 sites prospectively enrolled a cohort of patients undergoing rACLR. Patients were followed up at 2 and 6 years postoperatively regarding patient-reported outcomes (PROs), additional surgeries, and incidence of graft failure. Surgeon reported intraoperative findings and preoperative radiographic measures were obtained. Data was preprocessed and 381 demographic, clinical, and surgical features were used to build five machine learning models predictive of graft failure at 6 years postoperatively. Models used included traditional logistic regression (LR), XGBoost, Gradient Boosting, Random Forest, and a validated ensemble algorithm, AutoPrognosis (AP). Validated performance metrics for binary outcome prediction models were used to determine discriminative power and calibration. Individual feature importance was calculated for the highest performing models using partial dependence and perturbation-based feature importance. Results: The cohort included 831 patients who completed six year follow up, and 5.8% (n=48) of whom experienced graft failure. While all models had moderate to good concordance, AP demonstrated the highest discriminative power compared to other models (Model: AUROC | AP: 0.722 | Random Forest: 0.621 | Gradient Boosting: 0.654 | XGBoost: 0.690 | Logistic Regression: 0.630). The AP model was well-calibrated, with calibration scores similar to the other studied models (Model: Brier Score | AP: 0.054 | Random Forest: 0.054 | Gradient Boosting: 0.059 | XGBoost: 0.059 | Logistic Regression: 0.102). Features deemed important for AP differ from those for LR model performance. For AP, partial dependence feature importance calculations demonstrated the following top five contributors to model predictive ability: surgeon years of experience, prior femoral tunnel position measured via preoperative radiograph, compromised prior ACLR femoral tunnel position and size, and baseline patient age. Similarly, AP perturbation-based calculations showed the following top five features that contribute most to the AUROC performance: baseline patient age, current ACLR graft type, years since previous ACLR, prior ACLR femoral tunnel position, and prior tibial tunnel position on sagittal view preoperative radiographs. Perturbation-based calculations showed the following top five features that contribute most to the AUPRC performance: current ACLR graft type, baseline patient age, prior tibial tunnel position on sagittal view preoperative radiographs, previous ipsilateral medial meniscus repair, and baseline SF-36 subscale scores for patient vitality. Other leading contributors to model predictive ability included baseline PROs including SF-36 subscale scores for physical function, mental health and composite mental health, KOOS quality of life, WOMAC pain, and MARX activity rating scores. Conclusions: Of the studied models in this preliminary analysis, AP appears to most accurately predict rACLR graft failure. These findings build on prior studies, identifying key surgical, clinical, radiographic, and patient-reported factors contributing to the model’s ability to predict rACLR graft failure at 6 years postoperatively. Further work includes creation of a clinical risk calculator using important feature inputs, and the use of survival modeling to determine risk scores at various postoperative time points.
Read full abstract