Predicting patients who will drop out of out-patient psychotherapy using machine learning algorithms.

Björn Bennemann,Wolfgang Lutz,Brian Schwartz,Julia Giesemann

doi:10.1192/bjp.2022.17

Björn Bennemann, Wolfgang Lutz + Show 2 more

Open Access

https://doi.org/10.1192/bjp.2022.17

Copy DOI

Abstract

About 30% of patients drop out of cognitive-behavioural therapy (CBT), which has implications for psychiatric and psychological treatment. Findings concerning drop out remain heterogeneous. This paper aims to compare different machine-learning algorithms using nested cross-validation, evaluate their benefit in naturalistic settings, and identify the best model as well as the most important variables. The data-set consisted of 2543 out-patients treated with CBT. Assessment took place before session one. Twenty-one algorithms and ensembles were compared. Two parameters (Brier score, area under the curve (AUC)) were used for evaluation. The best model was an ensemble that used Random Forest and nearest-neighbour modelling. During the training process, it was significantly better than generalised linear modelling (GLM) (Brier score: d = -2.93, 95% CI (-3.95, -1.90)); AUC: d = 0.59, 95% CI (0.11 to 1.06)). In the holdout sample, the ensemble was able to correctly identify 63.4% of cases of patients, whereas the GLM only identified 46.2% correctly. The most important predictors were lower education, lower scores on the Personality Style and Disorder Inventory (PSSI) compulsive scale, younger age, higher scores on the PSSI negativistic and PSSI antisocial scale as well as on the Brief Symptom Inventory (BSI) additional scale (mean of the four additional items) and BSI overall scale. Machine learning improves drop-out predictions. However, not all algorithms are suited to naturalistic data-sets and binary events. Tree-based and boosted algorithms including a variable selection process seem well-suited, whereas more advanced algorithms such as neural networks do not.

Highlights

About 30% of patients drop out of cognitive–behavioural therapy (CBT), which has implications for psychiatric and psychological treatment
Not all algorithms are suited to naturalistic data-sets and binary events
The best model was an ensemble that used Random Forest and nearest-neighbour modelling. It was significantly better than generalised linear modelling (GLM) (Brier score: d = –2.93, 95% CI (−3.95, −1.90)); area under the curve (AUC): d = 0.59, 95% CI (0.11 to 1.06))

Summary

Background

About 30% of patients drop out of cognitive–behavioural therapy (CBT), which has implications for psychiatric and psychological treatment. Aims This paper aims to compare different machine-learning algorithms using nested cross-validation, evaluate their benefit in naturalistic settings, and identify the best model as well as the most important variables. Method The data-set consisted of 2543 out-patients treated with CBT. Two parameters (Brier score, area under the curve (AUC)) were used for evaluation. Were lower education, lower scores on the Personality Style and Disorder Inventory (PSSI) compulsive scale, younger age, higher scores on the PSSI negativistic and PSSI antisocial scale as well as on the Brief Symptom Inventory (BSI) additional scale (mean of the four additional items) and BSI overall scale

Conclusions

Results

Method

Discussion

Limitations