A clinical site workload prediction model with machine learning lifecycle

Bilal Mirza,Xinyang Li,Kris Lauwers,Bhargava Reddy,Anja Muller,Craig Wozniak,Sina Djali

doi:10.1016/j.health.2023.100159

Abstract

In clinical trial monitoring, substantial resources are allocated to perform source data verification (SDV). SDV ensures accurate and reliable transcription of trial participant information. Clinical site visits are typically scheduled at a fixed frequency for SDV, without objectively factoring in individual site workload. This often results in wasted resources and directly influences clinical trial cost. We leveraged historical data from several hundred clinical trials to predict SDV workload at trial sites using machine learning. Specifically, we adopted cross industry standard process for data mining (CRISP-DM) process model and devised a novel deep learning pipeline for longitudinal clinical trial data. The deep learning pipeline, which comprises recurrent neural network-based encoder and decoder, ingests multivariate sequence data from study sites and predicts SDV workload for future months. We also developed an efficient model enhancement workflow, in a data science platform that facilitates machine learning operations best practices, for timely adaptation of new features and changes. Several enhancement iterations have been performed since the launch of first SDV workload prediction model, resulting in a more accurate latest model compared to previous versions. We discuss these enhancements in the context of CRISP-DM phases. In conclusion, the SDV workload prediction model has enabled informed planning and optimization of resources within clinical trial operations.

Full Text