Calibration Metrics Research Articles

Incidental durotomy is an intraoperative complication in spine surgery that can lead to postoperative complications, increased length of stay, and higher healthcare costs. Natural language processing (NLP) is an artificial intelligence method that assists in understanding free-text notes that may be useful in the automated surveillance of adverse events in orthopaedic surgery. A previously developed NLP algorithm is highly accurate in the detection of incidental durotomy on internal validation and external validation in an independent cohort from the same country. External validation in a cohort with linguistic differences is required to assess the transportability of the developed algorithm, referred to geographical validation. Ideally, the performance of a prediction model, the NLP algorithm, is constant across geographic regions to ensure reproducibility and model validity. Can we geographically validate an NLP algorithm for the automated detection of incidental durotomy across three independent cohorts from two continents? Patients 18 years or older undergoing a primary procedure of (thoraco)lumbar spine surgery were included. In Massachusetts, between January 2000 and June 2018, 1000 patients were included from two academic and three community medical centers. In Maryland, between July 2016 and November 2018, 1279 patients were included from one academic center, and in Australia, between January 2010 and December 2019, 944 patients were included from one academic center. The authors retrospectively studied the free-text operative notes of included patients for the primary outcome that was defined as intraoperative durotomy. Incidental durotomy occurred in 9% (93 of 1000), 8% (108 of 1279), and 6% (58 of 944) of the patients, respectively, in the Massachusetts, Maryland, and Australia cohorts. No missing reports were observed. Three datasets (Massachusetts, Australian, and combined Massachusetts and Australian) were divided into training and holdout test sets in an 80:20 ratio. An extreme gradient boosting (an efficient and flexible tree-based algorithm) NLP algorithm was individually trained on each training set, and the performance of the three NLP algorithms (respectively American, Australian, and combined) was assessed by discrimination via area under the receiver operating characteristic curves (AUC-ROC; this measures the model's ability to distinguish patients who obtained the outcomes from those who did not), calibration metrics (which plot the predicted and the observed probabilities) and Brier score (a composite of discrimination and calibration). In addition, the sensitivity (true positives, recall), specificity (true negatives), positive predictive value (also known as precision), negative predictive value, F1-score (composite of precision and recall), positive likelihood ratio, and negative likelihood ratio were calculated. The combined NLP algorithm (the combined Massachusetts and Australian data) achieved excellent performance on independent testing data from Australia (AUC-ROC 0.97 [95% confidence interval 0.87 to 0.99]), Massachusetts (AUC-ROC 0.99 [95% CI 0.80 to 0.99]) and Maryland (AUC-ROC 0.95 [95% CI 0.93 to 0.97]). The NLP developed based on the Massachusetts cohort had excellent performance in the Maryland cohort (AUC-ROC 0.97 [95% CI 0.95 to 0.99]) but worse performance in the Australian cohort (AUC-ROC 0.74 [95% CI 0.70 to 0.77]). We demonstrated the clinical utility and reproducibility of an NLP algorithm with combined datasets retaining excellent performance in individual countries relative to algorithms developed in the same country alone for detection of incidental durotomy. Further multi-institutional, international collaborations can facilitate the creation of universal NLP algorithms that improve the quality and safety of orthopaedic surgery globally. The combined NLP algorithm has been incorporated into a freely accessible web application that can be found at https://sorg-apps.shinyapps.io/nlp_incidental_durotomy/ . Clinicians and researchers can use the tool to help incorporate the model in evaluating spine registries or quality and safety departments to automate detection of incidental durotomy and optimize prevention efforts. Level III, diagnostic study.

Read full abstract

Abstract Background Effective cardiovascular preventive strategies are crucial among people living with HIV (PLWH), who are facing a high burden of atherosclerotic cardiovascular disease (ASCVD). However, it remains unclear which cardiovascular risk score is the most appropriate in clinical practice. Purpose We aimed to prospectively assess and compare the accuracy of widely used cardiovascular risk scores in PLWH and individuals from the general population. Methods We used data from the Swiss HIV Cohort Study (SHCS), a longitudinal study involving 20,802 HIV-infected adults aged over 18 years, and from the CoLaus|PsyCoLaus study, a Swiss population-based cohort including 6,733 individuals aged 35–75 years. The European Systematic Coronary Risk Evaluation Score (SCORE), the North American Pooled Cohort Equation (PCE) and the HIV-specific Data Collection o-n Adverse events of Anti-HIV Drugs (D:A:D) score were calculated for all participants free from ASCVD between January 1, 2003 and December 31, 2009. Accuracy of the scores was assessed based on discrimination and calibration metrics for each cohort separately using incident ASCVD as outcome. The value of adding HIV-specific factors to the model presenting the best predictive capacities between SCORE and PCE was evaluated using the net reclassification index (NRI). Results 6,373 PLWH (28.4% women; aged 40.6 [SD, 9.9]; 57.2% on antiretroviral therapy) and 5,403 individuals from the general population (53.5% women, aged 52.8 [SD, 10.7]) were included in the analysis with a mean follow-up time of 13.5 (SD, 4.1) and 9.9 (SD, 2.3) years, respectively. 533 (8.4%) participants in the SHCS and 374 (6.9%) in the CoLaus|PsyCoLaus study experienced an incident ASCVD translating into age-adjusted incidence rates of 12.9 vs. 7.5 per 1,000 person-year, respectively. In SHCS, PCE and D:A:D presented discriminative capacities with AUROC of 0.757 (95% CI, 0.736–0.777) and 0.763 (95% CI, 0.743–0.783), respectively, compared to SCORE (0.704 [95% CI, 0.681–0.728]). Calibration of all scores was suboptimal in SHCS, with under-prediction of ASCVD in the higher deciles of risk compared to the CoLaus|PsyCoLaus study. Adding CD4 nadir (&lt;200 cells/mm3) and abacavir exposure as categorical variables to PCE resulted in a marginal improvement in discrimination and in a global NRI of 2.7% (95% CI, 0.3–5.1, p-value = 0.03). Conclusions PLWH presented a two-fold higher rate of incident ASCVD compared to individuals of the same age from the general population. The accuracy of PCE score to predict ASCVD in PLWH is equivalent to the D:A:D score and may represent a better alternative due to its reduced set of variables and its widespread use. Adding HIV-specific factors to PCE did not improve its predictive performance. Funding Acknowledgement Type of funding sources: Public grant(s) – National budget only. Main funding source(s): Swiss National Science Foundation

Read full abstract

Calibration Metrics Research Articles

Articles published on Calibration Metrics

Can We Geographically Validate a Natural Language Processing Algorithm for Automated Detection of Incidental Durotomy Across Three Independent Cohorts From Two Continents?

Comparing Machine Learning to Regression Methods for Mortality Prediction Using Veterans Affairs Electronic Health Record Clinical Data.

Abstract P021: Performance Of Pooled Cohort Equations And MESA Risk Score Across Race/Ethnicity And Socioeconomic Status To Estimate 10-year Cardiovascular Risk In Diverse New England Cohort

Predicting the long-term cognitive trajectories using machine learning approaches: A Chinese nationwide longitudinal database

Graphical calibration curves and the integrated calibration index (ICI) for competing risk models

Hydrological process knowledge in catchment modelling – Lessons and perspectives from 60 years development

Impacts of spatiotemporal resolution and tiling on SLEUTH model calibration and forecasting for urban areas with unregulated growth patterns

Development and Validation of a Predictive Model to Identify Patients With an Ascending Thoracic Aortic Aneurysm.

Supplementing Existing Societal Risk Models for Surgical Aortic Valve Replacement With Machine Learning for Improved Prediction.

Machine learning-based clinical outcome prediction in surgery for acromegaly

Cardiovascular risk assessment in people living with HIV compared to the general population

Prediction of Major Adverse Events After Endovascular Aneurysm Repair Using a Machine Learning Model

Jointly Calibrating Hydrologic Model Parameters and State Adjustments

Want to model a species niche? A step-by-step guideline on correlative ecological niche modelling

Numerical Modeling of the Effects of Toe Configuration on Throughflow in Rockfill Dams

Class-wise Calibration: A Case Study on COVID-19 Hate Speech

Toward Dynamic Risk Prediction of Outcomes After Coronary Artery Bypass Graft: Improving Risk Prediction With Intraoperative Events Using Gradient Boosting.

An Improvement of Survival Stratification in Glioblastoma Patients via Combining Subregional Radiomics Signatures.

Performance assessment of the metastatic spinal tumor frailty index using machine learning algorithms: limitations and future directions.

Using machine learning to improve risk prediction in durable left ventricular assist devices.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Calibration Metrics Research Articles

Articles published on Calibration Metrics

Can We Geographically Validate a Natural Language Processing Algorithm for Automated Detection of Incidental Durotomy Across Three Independent Cohorts From Two Continents?

Comparing Machine Learning to Regression Methods for Mortality Prediction Using Veterans Affairs Electronic Health Record Clinical Data.

Abstract P021: Performance Of Pooled Cohort Equations And MESA Risk Score Across Race/Ethnicity And Socioeconomic Status To Estimate 10-year Cardiovascular Risk In Diverse New England Cohort

Predicting the long-term cognitive trajectories using machine learning approaches: A Chinese nationwide longitudinal database

Graphical calibration curves and the integrated calibration index (ICI) for competing risk models

Hydrological process knowledge in catchment modelling – Lessons and perspectives from 60 years development

Impacts of spatiotemporal resolution and tiling on SLEUTH model calibration and forecasting for urban areas with unregulated growth patterns

Development and Validation of a Predictive Model to Identify Patients With an Ascending Thoracic Aortic Aneurysm.

Supplementing Existing Societal Risk Models for Surgical Aortic Valve Replacement With Machine Learning for Improved Prediction.

Machine learning-based clinical outcome prediction in surgery for acromegaly

Cardiovascular risk assessment in people living with HIV compared to the general population

Prediction of Major Adverse Events After Endovascular Aneurysm Repair Using a Machine Learning Model

Jointly Calibrating Hydrologic Model Parameters and State Adjustments

Want to model a species niche? A step-by-step guideline on correlative ecological niche modelling

Numerical Modeling of the Effects of Toe Configuration on Throughflow in Rockfill Dams

Class-wise Calibration: A Case Study on COVID-19 Hate Speech

Toward Dynamic Risk Prediction of Outcomes After Coronary Artery Bypass Graft: Improving Risk Prediction With Intraoperative Events Using Gradient Boosting.

An Improvement of Survival Stratification in Glioblastoma Patients via Combining Subregional Radiomics Signatures.

Performance assessment of the metastatic spinal tumor frailty index using machine learning algorithms: limitations and future directions.

Using machine learning to improve risk prediction in durable left ventricular assist devices.