Abstract
Objective Session-by-session tracking of the working alliance enables clinicians to detect alliance deterioration and intervene accordingly, which has shown to improve treatment outcome, and reduce dropout. Despite this, regular use of alliance self-report measures has failed to gain widespread implementation. We aimed to develop an automated alliance prediction using behavioral features obtained from video-recorded therapy sessions. Method A naturalistic dataset of session recordings with patient-ratings of working alliance was available for 252 in-person and teletherapy sessions from 47 patients treated by 10 clinicians. Text and audio-based features were extracted from all 252 sessions. Additional video-based feature extraction was possible for a subsample of 80 sessions. We developed a modeling pipeline for audio and text and for audio, text and video to train machine learning regression models that fuse multimodal features. Results Best results were achieved with a Gradient Boosting architecture, when using audio, text, and video features extracted from the patient (ICC = 0.66, Pearson r = 0.70, MAE = 0.33). Conclusion Automated alliance prediction from video-recorded therapy sessions is feasible with high accuracy. A data-driven multimodal approach to feature extraction and selection enables powerful models, outperforming previous work.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have