Abstract
Over the course of a clinical trial, irregularities may arise in the data. Trialists implement human-intensive, expensive central statistical monitoring procedures to identify and correct these irregularities before the results of the trial are analyzed and disseminated. Machine learning algorithms have shown promise for identifying center-level irregularities in multi-center clinical trials with minimal human intervention. We aimed to characterize the form-level data irregularities in several historical clinical trials and evaluate the ability of a machine learning-based outlier detection algorithm to identify them. Data irregularities previously identified by humans in historical clinical trials were ascertained by comparing preliminary snapshots of the trial databases to the final, locked databases. We measured the ability of a machine learning based outlier detection algorithm to identify form-level irregularities using concordance (area under the receiver operator characteristic), positive predictive value (precision), and sensitivity (recall). We examined preliminary snapshots of seven historical clinical trials which randomized a total of 77,001 participants. We extracted a total of 1,267,484 completed entries from 358 case report forms containing irregularities from all snapshots across all trials, containing a total of 24,850 form-wide irregularities (median per-form form-level irregularity rate: 1.81%). Our proposed machine learning algorithm detects form-level irregularities with a median concordance of 0.74 (interquartile range = 0.57-0.89), slightly exceeding the performance of a previously proposed machine learning approach with a median area under the receiver operator characteristic of 0.73 (interquartile range = 0.54-0.88). Data irregularities in historical clinical trials were ascertained by comparing preliminary snapshots of the trial database to the final database. These irregularities can be categorized according to their scope. Irregularities can be successfully detected by a machine learning algorithm as early or earlier than a human can, without human intervention. Such an approach may complement existing techniques for central statistical monitoring in large multi-center randomized controlled trials and possibly improve the efficiency of costly data verification processes.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.