Dataset Shift Research Articles

Temporal dataset shift associated with changes in healthcare over time is a barrier to deploying machine learning-based clinical decision support systems. Algorithms that learn robust models by estimating invariant properties across time periods for domain generalization (DG) and unsupervised domain adaptation (UDA) might be suitable to proactively mitigate dataset shift. The objective was to characterize the impact of temporal dataset shift on clinical prediction models and benchmark DG and UDA algorithms on improving model robustness. In this cohort study, intensive care unit patients from the MIMIC-IV database were categorized by year groups (2008–2010, 2011–2013, 2014–2016 and 2017–2019). Tasks were predicting mortality, long length of stay, sepsis and invasive ventilation. Feedforward neural networks were used as prediction models. The baseline experiment trained models using empirical risk minimization (ERM) on 2008–2010 (ERM[08–10]) and evaluated them on subsequent year groups. DG experiment trained models using algorithms that estimated invariant properties using 2008–2016 and evaluated them on 2017–2019. UDA experiment leveraged unlabelled samples from 2017 to 2019 for unsupervised distribution matching. DG and UDA models were compared to ERM[08–16] models trained using 2008–2016. Main performance measures were area-under-the-receiver-operating-characteristic curve (AUROC), area-under-the-precision-recall curve and absolute calibration error. Threshold-based metrics including false-positives and false-negatives were used to assess the clinical impact of temporal dataset shift and its mitigation strategies. In the baseline experiments, dataset shift was most evident for sepsis prediction (maximum AUROC drop, 0.090; 95% confidence interval (CI), 0.080–0.101). Considering a scenario of 100 consecutively admitted patients showed that ERM[08–10] applied to 2017–2019 was associated with one additional false-negative among 11 patients with sepsis, when compared to the model applied to 2008–2010. When compared with ERM[08–16], DG and UDA experiments failed to produce more robust models (range of AUROC difference, − 0.003 to 0.050). In conclusion, DG and UDA failed to produce more robust models compared to ERM in the setting of temporal dataset shift. Alternate approaches are required to preserve model performance over time in clinical medicine.

Read full abstract

Abstract Background Accurate prediction of outcomes following a heart transplant is critical to explaining risks and benefits to patients and decision-making when considering potential organ offers. Given the large number of potential variables to be considered, this task may be most efficiently performed using machine learning (ML). Purpose We trained and tested different ML algorithms to accurately predict outcomes following a cardiac transplant using the United Network of Organ Sharing (UNOS) database. Methods We included 67 939 adult and pediatric patients enrolled in the UNOS database between January 1994 and December 2016 who underwent cardiac transplantation (median age 53 [IQR 38 – 60], 72.7% males). In our models, as an input, we included 114 features that have been collected from recipients and donors prior to transplant. The primary outcome was all-cause mortality at one-year post-transplant. We evaluated three different ML methods: XGBoost, Random Forest (RF) and L2 regularized logistic regression. Algorithms were trained and tested using shuffled 10-fold cross-validation (CV) as well as rolling CV. In the rolling CV, to mimic prospective procedure, ML models were trained by incrementally adding patients according to transplant year and testing models on the data in the following year. The hyperparameters, controlling the learning process, were tuned using Bayesian optimization. Prognostic accuracy for one-year all-cause mortality was characterized using the area under the receiver-operating characteristic curve (AUC). Results In total, 8,394 patients died within 1 year of transplant. We observed a substantial difference in prognostic accuracy between the shuffled 10-fold CV and the rolling CV. In the 10-fold CV, XGBoost and RF achieved high predictive performance with AUC of 0.848 (95% CI: 0.842–0.854) and 0.891 (95% CI: 0.886–0.896), respectively. In the rolling CV, which is a more realistic setting, AUC dropped to 0.673 (95% CI: 0.661–0.684) for XGBoost and 0.670 (0.657–0.683) for RF. Predictive performance of L2 regularized logistic regression remained stable across the two CV procedures, achieving AUC 0.669 (95% CI: 0.662–0.676) in the 10-fold shuffled CV and 0.665 (95% CI: 0.649–0.680) in the rolling CV procedure (Figure). Conclusions Our study suggests that ML models could be used to predict mortality in the first year post-transplant. We also show that the choice of CV procedure is crucial for evaluating ML models, particularly in data collected over a long period of time. The difference between the shuffled and rolling CV in the predictive performance of the tree-based ML models might indicate temporal dataset shift. In the rolling CV, all three methods achieved similar predictive performance. Funding Acknowledgement Type of funding sources: Public grant(s) – National budget only. Main funding source(s): Research Foundation Flanders (FWO)

Read full abstract

Dataset Shift Research Articles

Related Topics

Articles published on Dataset Shift

Radio Galaxy Zoo: using semi-supervised learning to leverage large unlabelled data sets for radio galaxy classification under data set shift

A Unified Framework on Generalizability of Clinical Prediction Models.

Multi-Level Stacked Regression for predicting electricity consumption of Hot Rolling Mill

Automatic Fish Age Determination across Different Otolith Image Labs Using Domain Adaptation

Comparison of different ML methods concerning prediction quality, domain adaptation and robustness

Evaluation of domain generalization and adaptation on improving model robustness to temporal dataset shift in clinical medicine

Assessing the performance of control charts for detecting previously unexplored shift types in high density spatial data

Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures.

Towards robust partially supervised multi-structure medical image segmentation on small-scale data

Action Recognition Using Close-Up of Maximum Activation and ETRI-Activity3D LivingLab Dataset.

Temporal shift and accuracy of machine learning in heart transplant outcomes

Preventing dataset shift from breaking machine-learning biomarkers.

Inertial load classification of low-cost electro-mechanical systems under dataset shift with fast end of line testing

Out-of-time cross-validation strategies for classification in the presence of dataset shift

Systematic Review of Approaches to Preserve Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine.

Generalized Bayes Quantification Learning under Dataset Shift

Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications.

DIBS: Diversity Inducing Information Bottleneck in Model Ensembles

Finding Quasars behind the Galactic Plane. I. Candidate Selections with Transfer Learning

Viral Pneumonia Screening on Chest X-Rays Using Confidence-Aware Anomaly Detection.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Dataset Shift Research Articles

Related Topics

Articles published on Dataset Shift

Radio Galaxy Zoo: using semi-supervised learning to leverage large unlabelled data sets for radio galaxy classification under data set shift

A Unified Framework on Generalizability of Clinical Prediction Models.

Multi-Level Stacked Regression for predicting electricity consumption of Hot Rolling Mill

Automatic Fish Age Determination across Different Otolith Image Labs Using Domain Adaptation

Comparison of different ML methods concerning prediction quality, domain adaptation and robustness

Evaluation of domain generalization and adaptation on improving model robustness to temporal dataset shift in clinical medicine

Assessing the performance of control charts for detecting previously unexplored shift types in high density spatial data

Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures.

Towards robust partially supervised multi-structure medical image segmentation on small-scale data

Action Recognition Using Close-Up of Maximum Activation and ETRI-Activity3D LivingLab Dataset.

Temporal shift and accuracy of machine learning in heart transplant outcomes

Preventing dataset shift from breaking machine-learning biomarkers.

Inertial load classification of low-cost electro-mechanical systems under dataset shift with fast end of line testing

Out-of-time cross-validation strategies for classification in the presence of dataset shift

Systematic Review of Approaches to Preserve Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine.

Generalized Bayes Quantification Learning under Dataset Shift

Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications.

DIBS: Diversity Inducing Information Bottleneck in Model Ensembles

Finding Quasars behind the Galactic Plane. I. Candidate Selections with Transfer Learning

Viral Pneumonia Screening on Chest X-Rays Using Confidence-Aware Anomaly Detection.