Validation of a machine learning-derived clinical metric to quantify outcomes after total shoulder arthroplasty.

Christopher Roche,Joseph Zuckerman,Howard Routman,Vikas Kumar,Ryan Simovitch,Pierre-Henri Flurin,Steven Overman,Ankur Teredesai,Thomas Wright

doi:10.1016/j.jse.2021.01.021

Abstract

We propose a new clinical assessment tool constructed using machine learning, called the Shoulder Arthroplasty Smart (SAS) score to quantify outcomes following total shoulder arthroplasty (TSA). Clinical data from 3667 TSA patients with 8104 postoperative follow-up reports were used to quantify the psychometric properties of validity, responsiveness, and clinical interpretability for the proposed SAS score and each of the Simple Shoulder Test (SST), Constant, American Shoulder and Elbow Surgeons Standardized Shoulder Assessment Form (ASES), University of California Los Angeles (UCLA), and Shoulder Pain and Disability Index (SPADI) scores. Convergent construct validity was demonstrated, with all 6 outcome measures being moderately to highly correlated preoperatively and highly correlated postoperatively when quantifying TSA outcomes. The SAS score was most correlated with the UCLA score and least correlated with the SST. No clinical outcome score exhibited significant floor effects preoperatively or postoperatively or significant ceiling effects preoperatively; however, significant ceiling effects occurred postoperatively for each of the SST (44.3%), UCLA (13.9%), ASES (18.7%), and SPADI (19.3%) measures. Ceiling effects were more pronounced for anatomic than reverse TSA, and generally, men, younger patients, and whites who received TSA were more likely to experience a ceiling effect than TSA patients who were female, older, and of non-white race or ethnicity. The SAS score had the least number of patients with floor and ceiling effects and also exhibited no response bias in any patient characteristic analyzed in this study. Regarding clinical interpretability, patient satisfaction anchor-based thresholds for minimal clinically importance difference and substantial clinical benefit were quantified for all 6 outcome measures; the SAS score thresholds were most similar in magnitude to the Constant score. Regarding responsiveness, all 6 outcome measures detected a large effect, with the UCLA exhibiting the most responsiveness and the SST exhibiting the least. Finally, each of the SAS, ASES, Constant, and SPADI scores had similarly large standardized response mean and effect size responsiveness. The 6-question SAS score is an efficient TSA-specific outcome measure with equivalent or better validity, responsiveness, and clinical interpretability as 5 other historical assessment tools. The SAS score has an appropriate response range without floor or ceiling effects and without bias in any target patient characteristic, unlike the age, gender, or race/ethnicity bias observed in the ceiling scores with the other outcome measures. Because of these substantial benefits, we recommend the use of the new SAS score for quantifying TSA outcomes.

Full Text