Complex Predictive Models Research Articles

BackgroundDespite the promise of machine learning (ML) to inform individualized medical care, the clinical utility of ML in medicine has been limited by the minimal interpretability and black box nature of these algorithms.ObjectiveThe study aimed to demonstrate a general and simple framework for generating clinically relevant and interpretable visualizations of black box predictions to aid in the clinical translation of ML.MethodsTo obtain improved transparency of ML, simplified models and visual displays can be generated using common methods from clinical practice such as decision trees and effect plots. We illustrated the approach based on postprocessing of ML predictions, in this case random forest predictions, and applied the method to data from the Left Ventricular (LV) Structural Predictors of Sudden Cardiac Death (SCD) Registry for individualized risk prediction of SCD, a leading cause of death.ResultsWith the LV Structural Predictors of SCD Registry data, SCD risk predictions are obtained from a random forest algorithm that identifies the most important predictors, nonlinearities, and interactions among a large number of variables while naturally accounting for missing data. The black box predictions are postprocessed using classification and regression trees into a clinically relevant and interpretable visualization. The method also quantifies the relative importance of an individual or a combination of predictors. Several risk factors (heart failure hospitalization, cardiac magnetic resonance imaging indices, and serum concentration of systemic inflammation) can be clearly visualized as branch points of a decision tree to discriminate between low-, intermediate-, and high-risk patients.ConclusionsThrough a clinically important example, we illustrate a general and simple approach to increase the clinical translation of ML through clinician-tailored visual displays of results from black box algorithms. We illustrate this general model-agnostic framework by applying it to SCD risk prediction. Although we illustrate the methods using SCD prediction with random forest, the methods presented are applicable more broadly to improving the clinical translation of ML, regardless of the specific ML algorithm or clinical application. As any trained predictive model can be summarized in this manner to a prespecified level of precision, we encourage the use of simplified visual displays as an adjunct to the complex predictive model. Overall, this framework can allow clinicians to peek inside the black box and develop a deeper understanding of the most important features from a model to gain trust in the predictions and confidence in applying them to clinical care.

Read full abstract

BackgroundIdentifying people at risk of cardiovascular diseases (CVD) is a cornerstone of preventative cardiology. Risk prediction models currently recommended by clinical guidelines are typically based on a limited number of predictors with sub-optimal performance across all patient groups. Data-driven techniques based on machine learning (ML) might improve the performance of risk predictions by agnostically discovering novel risk predictors and learning the complex interactions between them. We tested (1) whether ML techniques based on a state-of-the-art automated ML framework (AutoPrognosis) could improve CVD risk prediction compared to traditional approaches, and (2) whether considering non-traditional variables could increase the accuracy of CVD risk predictions.Methods and findingsUsing data on 423,604 participants without CVD at baseline in UK Biobank, we developed a ML-based model for predicting CVD risk based on 473 available variables. Our ML-based model was derived using AutoPrognosis, an algorithmic tool that automatically selects and tunes ensembles of ML modeling pipelines (comprising data imputation, feature processing, classification and calibration algorithms). We compared our model with a well-established risk prediction algorithm based on conventional CVD risk factors (Framingham score), a Cox proportional hazards (PH) model based on familiar risk factors (i.e, age, gender, smoking status, systolic blood pressure, history of diabetes, reception of treatments for hypertension and body mass index), and a Cox PH model based on all of the 473 available variables. Predictive performances were assessed using area under the receiver operating characteristic curve (AUC-ROC). Overall, our AutoPrognosis model improved risk prediction (AUC-ROC: 0.774, 95% CI: 0.768-0.780) compared to Framingham score (AUC-ROC: 0.724, 95% CI: 0.720-0.728, p < 0.001), Cox PH model with conventional risk factors (AUC-ROC: 0.734, 95% CI: 0.729-0.739, p < 0.001), and Cox PH model with all UK Biobank variables (AUC-ROC: 0.758, 95% CI: 0.753-0.763, p < 0.001). Out of 4,801 CVD cases recorded within 5 years of baseline, AutoPrognosis was able to correctly predict 368 more cases compared to the Framingham score. Our AutoPrognosis model included predictors that are not usually considered in existing risk prediction models, such as the individuals’ usual walking pace and their self-reported overall health rating. Furthermore, our model improved risk prediction in potentially relevant sub-populations, such as in individuals with history of diabetes. We also highlight the relative benefits accrued from including more information into a predictive model (information gain) as compared to the benefits of using more complex models (modeling gain).ConclusionsOur AutoPrognosis model improves the accuracy of CVD risk prediction in the UK Biobank population. This approach performs well in traditionally poorly served patient subgroups. Additionally, AutoPrognosis uncovered novel predictors for CVD disease that may now be tested in prospective studies. We found that the “information gain” achieved by considering more risk factors in the predictive model was significantly higher than the “modeling gain” achieved by adopting complex predictive models.

Read full abstract

Complex Predictive Models Research Articles

Articles published on Complex Predictive Models

Missing Data Imputation for Multisite Rainfall Networks: A Comparison between Geostatistical Interpolation and Pattern-Based Estimation on Different Terrain Types

Multilevel-Modeling Interpretation of Trailing-Edge Noise Models for Wind Turbines with NACA 0012 Airfoil

Improving Clinical Translation of Machine Learning Approaches Through Clinician-Tailored Visual Displays of Black Box Algorithms: Development and Validation.

DALEX: Explainers for Complex Predictive Models in R

Геоинформационные системы в популяционном анализе распространения депрессивных расстройств в Хабаровске

Child‐related attributions of hostile intent and harsh discipline: Moderating effects of anger

Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants.

Predicting scheduled hospital attendance with artificial intelligence

A Path to Prediction of Outcomes in Juvenile Idiopathic Inflammatory Myopathy.

Soft Matter Informatics: Current Progress and Challenges

An energy-efficient internet of things (IoT) architecture for preventive conservation of cultural heritage

Family Impact and Parenting Styles in Families of Children with ADHD

Impact of environmental variables on Dubas bug infestation rate: A case study from the Sultanate of Oman.

Impact of functional somatic symptoms on 5–7-year-olds' healthcare use and costs

Severity Scores in Emergency Department Patients With Presumed Infection: A Prospective Validation Study.

Modelling of an Extended Brutedl Algorithm for Rule Extraction

Analysis of Low Intensity Laser Therapy as adjuvant to Photodynamic Therapy in Nonmelanoma Skin Cancer.

Self-rated health and hospital services use in the Spanish National Health System: a longitudinal study.

The Application of Sovereign Bond Spreads: The Case of United Kingdom, Iceland, Norway, Switzerland and Russia

Estimating Storm-Induced Dune Erosion and Overtopping along U.S. West Coast Beaches

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Complex Predictive Models Research Articles

Articles published on Complex Predictive Models

Missing Data Imputation for Multisite Rainfall Networks: A Comparison between Geostatistical Interpolation and Pattern-Based Estimation on Different Terrain Types

Multilevel-Modeling Interpretation of Trailing-Edge Noise Models for Wind Turbines with NACA 0012 Airfoil

Improving Clinical Translation of Machine Learning Approaches Through Clinician-Tailored Visual Displays of Black Box Algorithms: Development and Validation.

DALEX: Explainers for Complex Predictive Models in R

Геоинформационные системы в популяционном анализе распространения депрессивных расстройств в Хабаровске

Child‐related attributions of hostile intent and harsh discipline: Moderating effects of anger

Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants.

Predicting scheduled hospital attendance with artificial intelligence

A Path to Prediction of Outcomes in Juvenile Idiopathic Inflammatory Myopathy.

Soft Matter Informatics: Current Progress and Challenges

An energy-efficient internet of things (IoT) architecture for preventive conservation of cultural heritage

Family Impact and Parenting Styles in Families of Children with ADHD

Impact of environmental variables on Dubas bug infestation rate: A case study from the Sultanate of Oman.

Impact of functional somatic symptoms on 5–7-year-olds' healthcare use and costs

Severity Scores in Emergency Department Patients With Presumed Infection: A Prospective Validation Study.

Modelling of an Extended Brutedl Algorithm for Rule Extraction

Analysis of Low Intensity Laser Therapy as adjuvant to Photodynamic Therapy in Nonmelanoma Skin Cancer.

Self-rated health and hospital services use in the Spanish National Health System: a longitudinal study.

The Application of Sovereign Bond Spreads: The Case of United Kingdom, Iceland, Norway, Switzerland and Russia

Estimating Storm-Induced Dune Erosion and Overtopping along U.S. West Coast Beaches