Predicting dropout from psychological treatment using different machine learning algorithms, resampling methods, and sample sizes

Julia Giesemann,Jaime Delgadillo,Brian Schwartz,Björn Bennemann,Wolfgang Lutz

doi:10.1080/10503307.2022.2161432

Julia Giesemann, Jaime Delgadillo + Show 3 more

https://doi.org/10.1080/10503307.2022.2161432

Copy DOI

Abstract

Objective: The occurrence of dropout from psychological interventions is associated with poor treatment outcome and high health, societal and economic costs. Recently, machine learning (ML) algorithms have been tested in psychotherapy outcome research. Dropout predictions are usually limited by imbalanced datasets and the size of the sample. This paper aims to improve dropout prediction by comparing ML algorithms, sample sizes and resampling methods. Method: Twenty ML algorithms were examined in twelve subsamples (drawn from a sample of N = 49,602) using four resampling methods in comparison to the absence of resampling and to each other. Prediction accuracy was evaluated in an independent holdout dataset using the F 1-Measure. Results: Resampling methods improved the performance of ML algorithms and down-sampling can be recommended, as it was the fastest method and as accurate as the other methods. For the highest mean F 1-Score of .51 a minimum sample size of N = 300 was necessary. No specific algorithm or algorithm group can be recommended. Conclusion: Resampling methods could improve the accuracy of predicting dropout in psychological interventions. Down-sampling is recommended as it is the least computationally taxing method. The training sample should contain at least 300 cases.

Full Text