Randomized controlled trials (RCTs) are the gold standard for clinical research but may not accurately reflect the impact of medicines in real-world settings. Supplementing RCTs with insights from real-world data (RWD) can address known limitations by including more diverse patient populations, additional types of sites-of-care, and practices more representative of the care most people receive. One current challenge in using RWD is the lack of an algorithmic approach to identifying outcomes. To address this, machine learning models for identifying a frequently used outcome, Major Adverse Cardiovascular Events (MACE), were developed in Clinical Trial Data (CTD). Anonymized CTD sourced from the Medidata Enterprise Data Store were used to develop model features on the condition that they would be useful for labelling MACE events and that they could also be found in RWD. These features were used to train three random forest models to identify each component of 3-point MACE in a patient's clinical trial journey. Performance metrics for the models are presented (recall = 0.72 [0.07], precision = 0.68 [0.12] - mean, [SD]) along with the top contributing features. We show that the models can be tuned specifically to replicate the adjudication panels' results and present a cost-benefit analysis for deploying such models in clinical trial settings. We demonstrate the viability of using advanced algorithms for identifying clinical outcomes in prospective clinical trials. Deployment of such models could reduce the resources required to conduct RCTs. Extending such models to RWD would facilitate approval of pragmatic clinical trials for regulatory submissions.
Read full abstract