Imputed Datasets Research Articles

ObjectiveThe development of clinical prediction models is often impeded by the occurrence of missing values in the predictors. Various methods for imputing missing values before modelling have been proposed. Some of them are based on variants of multiple imputation by chained equations, while others are based on single imputation. These methods may include elements of flexible modelling or machine learning algorithms, and for some of them user-friendly software packages are available. The aim of this study was to investigate by simulation if some of these methods consistently outperform others in performance measures of clinical prediction models. Study Design and SettingWe simulated development and validation cohorts by mimicking observed distributions of predictors and outcome variable of a real data set. In the development cohorts, missing predictor values were created in 36 scenarios defined by the missingness mechanism and proportion of non-complete cases. We applied three imputation algorithms that were available in R software: mice, aregImpute and missForest. These algorithms differed in their use of linear or flexible models, or random forests, the way of sampling from the predictive posterior distribution, and the generation of a single or multiple imputed data sets. For multiple imputation we also investigated the impact of the number of imputations. Logistic regression models were fitted with the simulated development cohorts before (full data analysis) and after missing value generation (complete case analysis), and with the imputed data. Prognostic model performance was measured by the scaled Brier score, c-statistic, calibration intercept and slope, and by the mean absolute prediction error evaluated in validation cohorts without missing values. Performance of full data analysis was considered as ideal. ResultsNone of the imputation methods achieved the model’s predictive accuracy that would be obtained in case of no missingness. In general, complete case analysis yielded the worst performance, and deviation from ideal performance increased with increasing percentage of missingness and decreasing sample size. Across all scenarios and performance measures, aregImpute and mice, both with 100 imputations, resulted in highest predictive accuracy. Surprisingly aregImpute outperformed full data analysis in achieving calibration slopes very close to 1 across all scenarios and outcome models. The increase of mice’s performance with 100 compared to 5 imputations was only marginal. The differences between the imputation methods decreased with increasing sample sizes and decreasing proportion of non-complete cases. ConclusionIn our simulation study, model calibration was more affected by the choice of the imputation method than model discrimination. While differences in model performance after using imputation methods were generally small, multiple imputation methods as mice and aregImpute that can handle linear or nonlinear associations between predictors and outcome are an attractive and reliable choice in most situations.

Interventions are required that address delays in treatment-seeking and low treatment coverage among people consuming methamphetamine. We aim to determine whether a self-administered smartphone-based intervention, the "S-Check app" can increase help-seeking and motivation to change methamphetamine use, and determine factors associated with app engagement. This study is a randomized, 28-day waitlist-controlled trial. Consenting adults residing in Australia who reported using methamphetamine at least once in the last month were eligible to download the app for free from Android or iOS app stores. Those randomized to the intervention group had immediate access to the S-Check app, the control group was wait-listed for 28 days before gaining access, and then all had access until day 56. Actual help-seeking and intention to seek help were assessed by the modified Actual Help Seeking Questionnaire (mAHSQ), modified General Help Seeking Questionnaire, and motivation to change methamphetamine use by the modified readiness ruler. χ2 comparisons of the proportion of positive responses to the mAHSQ, modified General Help Seeking Questionnaire, and modified readiness ruler were conducted between the 2 groups. Logistic regression models compared the odds of actual help-seeking, intention to seek help, and motivation to change at day 28 between the 2 groups. Secondary outcomes were the most commonly accessed features of the app, methamphetamine use, feasibility and acceptability of the app, and associations between S-Check app engagement and participant demographic and methamphetamine use characteristics. In total, 560 participants downloaded the app; 259 (46.3%) completed eConsent and baseline; and 84 (32.4%) provided data on day 28. Participants in the immediate access group were more likely to seek professional help (mAHSQ) at day 28 than those in the control group (n=15, 45.5% vs n=12, 23.5%; χ21=4.42, P=.04). There was no significant difference in the odds of actual help-seeking, intention to seek help, or motivation to change methamphetamine use between the 2 groups on the primary logistic regression analyses, while in the ancillary analyses, the imputed data set showed a significant difference in the odds of seeking professional help between participants in the immediate access group compared to the waitlist control group (adjusted odds ratio 2.64, 95% CI 1.19-5.83, P=.02). For participants not seeking help at baseline, each minute in the app increased the likelihood of seeking professional help by day 28 by 8% (ratio 1.08, 95% CI 1.02-1.22, P=.04). Among the intervention group, a 10-minute increase in app engagement time was associated with a decrease in days of methamphetamine use by 0.4 days (regression coefficient [β] -0.04, P=.02). The S-Check app is a feasible low-resource self-administered intervention for adults in Australia who consume methamphetamine. Study attrition was high and, while common in mobile health interventions, warrants larger studies of the S-Check app. Australian New Zealand Clinical Trials Registry ACTRN12619000534189; https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=377288&isReview=true.

Imputed Datasets Research Articles

Related Topics

Articles published on Imputed Datasets

Evaluating the median p-value method for assessing the statistical significance of tests when using multiple imputation

Multiple imputation integrated to machine learning: predicting post-stroke recovery of ambulation after intensive inpatient rehabilitation

Determinants of Prevalence and Factors Associated with Anemia among Pregnant Women in Gambia: A Multivariate Analysis using DHS Data

Environmental chemical exposures and a machine learning-based model for predicting hypertension in NHANES 2003–2016

Predicting implementation of response to intervention in math using elastic net logistic regression.

Chronic obstructive pulmonary disease, asthma, and mechanical ventilation are risk factors for dyspnea in patients with long COVID: A Japanese nationwide cohort study

Comparative Analysis of Imputation Methods for Enhancing Predictive Accuracy in Data Models

Autoencoder imputation of missing heterogeneous data for Alzheimer's disease classification

Miesize: Effect-size calculation in imputed data

The performance of prognostic models depended on the choice of missing value imputation algorithm: a simulation study

The PROgnostic ModEl for chronic lung disease (PRO-MEL): development and temporal validation

Initial management of newly diagnosed WHO grade 2-3 adult meningioma following surgery: results from the Dutch Brain Tumour Registry (2016-2021).

Missing Values in Longitudinal Proteome Dynamics Studies: Making a Case for Data Multiple Imputation.

Machine learning assessment of vildagliptin and linagliptin effectiveness in type 2 diabetes: Predictors of glycemic control.

Post-endovascular therapy contrast extravasation in the mesial temporal region on dual-energy CT is associated with outcome in acute ischemic stroke patients

Randomized Trial of the Effectiveness of Videoconferencing-Based Versus Message-Based Psychotherapy on Depression.

Effect of elevated depressive symptoms during adolescence on health-related quality of life in young adulthood-a six-year cohort study with repeated exposure measurements.

Effect of a Smartphone App (S-Check) on Actual and Intended Help-Seeking and Motivation to Change Methamphetamine Use Among Adult Consumers of Methamphetamine in Australia: Randomized Waitlist-Controlled Trial.

Forecasting potential invaders to prevent future biological invasions worldwide.

A novel method for settlement imputation and monitoring of earth-rockfill dams subjected to large-scale missing data

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Imputed Datasets Research Articles

Related Topics

Articles published on Imputed Datasets

Evaluating the median p-value method for assessing the statistical significance of tests when using multiple imputation

Multiple imputation integrated to machine learning: predicting post-stroke recovery of ambulation after intensive inpatient rehabilitation

Determinants of Prevalence and Factors Associated with Anemia among Pregnant Women in Gambia: A Multivariate Analysis using DHS Data

Environmental chemical exposures and a machine learning-based model for predicting hypertension in NHANES 2003–2016

Predicting implementation of response to intervention in math using elastic net logistic regression.

Chronic obstructive pulmonary disease, asthma, and mechanical ventilation are risk factors for dyspnea in patients with long COVID: A Japanese nationwide cohort study

Comparative Analysis of Imputation Methods for Enhancing Predictive Accuracy in Data Models

Autoencoder imputation of missing heterogeneous data for Alzheimer's disease classification

Miesize: Effect-size calculation in imputed data

The performance of prognostic models depended on the choice of missing value imputation algorithm: a simulation study

The PROgnostic ModEl for chronic lung disease (PRO-MEL): development and temporal validation

Initial management of newly diagnosed WHO grade 2-3 adult meningioma following surgery: results from the Dutch Brain Tumour Registry (2016-2021).

Missing Values in Longitudinal Proteome Dynamics Studies: Making a Case for Data Multiple Imputation.

Machine learning assessment of vildagliptin and linagliptin effectiveness in type 2 diabetes: Predictors of glycemic control.

Post-endovascular therapy contrast extravasation in the mesial temporal region on dual-energy CT is associated with outcome in acute ischemic stroke patients

Randomized Trial of the Effectiveness of Videoconferencing-Based Versus Message-Based Psychotherapy on Depression.

Effect of elevated depressive symptoms during adolescence on health-related quality of life in young adulthood-a six-year cohort study with repeated exposure measurements.

Effect of a Smartphone App (S-Check) on Actual and Intended Help-Seeking and Motivation to Change Methamphetamine Use Among Adult Consumers of Methamphetamine in Australia: Randomized Waitlist-Controlled Trial.

Forecasting potential invaders to prevent future biological invasions worldwide.

A novel method for settlement imputation and monitoring of earth-rockfill dams subjected to large-scale missing data