Surgical repair of hip fracture carries substantial short-term risks of mortality and complications. The risk-reward calculus for most patients with hip fractures favors surgical repair. However, some patients have low prefracture functioning, frailty, and/or very high risk of postoperative mortality, making the choice between surgical and nonsurgical management more difficult. The importance of high-quality informed consent and shared decision-making for frail patients with hip fracture has recently been demonstrated. A tool to accurately estimate patient-specific risks of surgery could improve these processes. With this study, we sought (1) to develop, validate, and estimate the overall accuracy (C-index) of risk prediction models for 30-day mortality and complications after hip fracture surgery; (2) to evaluate the accuracy (sensitivity, specificity, and false discovery rates) of risk prediction thresholds for identifying very high-risk patients; and (3) to implement the models in an accessible web calculator. In this comparative study, preoperative demographics, comorbidities, and preoperatively known operative variables were extracted for all 82,168 patients aged 18 years and older undergoing surgery for hip fracture in the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) between 2011 and 2017. Eighty-two percent (66,994 of 82,168 ) of patients were at least 70 years old, 21% (17,007 of 82,168 ) were at least 90 years old, 70% (57,260 of 82,168 ) were female, and 79% (65,301 of 82,168 ) were White. A total of 5% (4260 of 82,168) of patients died within 30 days of surgery, and 8% (6786 of 82,168) experienced a major complication. The ACS-NSQIP database was chosen for its clinically abstracted and reliable data from more than 600 hospitals on important surgical outcomes, as well as rich characterization of preoperative demographic and clinical predictors for demographically diverse patients. Using all the preoperative variables in the ACS-NSQIP dataset, least absolute shrinkage and selection operator (LASSO) logistic regression, a type of machine learning that selects variables to optimize accuracy and parsimony, was used to develop and validate models to predict two primary outcomes: 30-day postoperative mortality and any 30-day major complications. Major complications were defined by the occurrence of ACS-NSQIP complications including: on a ventilator longer than 48 hours, intraoperative or postoperative unplanned intubation, septic shock, deep incisional surgical site infection (SSI), organ/space SSI, wound disruption, sepsis, intraoperative or postoperative myocardial infarction, intraoperative or postoperative cardiac arrest requiring cardiopulmonary resuscitation, acute renal failure needing dialysis, pulmonary embolism, stroke/cerebral vascular accident, and return to the operating room. Secondary outcomes were six clusters of complications recently developed and increasingly used for the development of surgical risk models, namely: (1) pulmonary complications, (2) infectious complications, (3) cardiac events, (4) renal complications, (5) venous thromboembolic events, and (6) neurological events. Tenfold cross-validation was used to assess overall model accuracy with C-indexes, a measure of how well models discriminate patients who experience an outcome from those who do not. Using the models, the predicted risk of outcomes for each patient were used to estimate the accuracy (sensitivity, specificity, and false discovery rates) of a wide range of predicted risk thresholds. We then implemented the prediction models into a web-accessible risk calculator. The 30-day mortality and major complication models had good to fair discrimination (C-indexes of 0.76 and 0.64, respectively) and good calibration throughout the range of predicted risk. Thresholds of predicted risk to identify patients at very high risk of 30-day mortality had high specificity but also high false discovery rates. For example, a 30-day mortality predicted risk threshold of 15% resulted in 97% specificity, meaning 97% of patients who lived longer than 30 days were below that risk threshold. However, this threshold had a false discovery rate of 78%, meaning 78% of patients above that threshold survived longer than 30 days and might have benefitted from surgery. The tool is available here: https://s-spire-clintools.shinyapps.io/hip_deploy/ . The models of mortality and complications we developed may be accurate enough for some uses, especially personalizing informed consent and shared decision-making with patient-specific risk estimates. However, the high false discovery rate suggests the models should not be used to restrict access to surgery for high-risk patients. Deciding which measures of accuracy to prioritize and what is "accurate enough" depends on the clinical question and use of the predictions. Discrimination and calibration are commonly used measures of overall model accuracy but may be poorly suited to certain clinical questions and applications. Clinically, overall accuracy may not be as important as knowing how accurate and useful specific values of predicted risk are for specific purposes.Level of Evidence Level III, therapeutic study.
Read full abstract