Can synthetic data accurately mimic oncology clinical trials?

Samer El Kababji,Gregory Russell Pond,Lucy Mosquera,Dhenuka Radhakrishnan,Mark J Clemons,Ana-Alicia Beltran-Bless,Lisa Vandermeer,Khaled El Emam,Nicholas Mitsakakis,Xi Fang

doi:10.1200/jco.2023.41.16_suppl.1554

Abstract

1554 Background: There is strong interest by researchers, the pharmaceutical industry, medical journal editors, funders of research, and regulators in sharing clinical trial data. Reusing data extracts the most utility possible from patient contributions. The majority of patients do want to share their data for secondary research purposes. However, data access for secondary analysis remains a challenge. A key reason why individual-level data is not made directly available to data users by authors and data custodians is concern over breaches of patient privacy. Synthetic data generation (SDG) is an effective way to address privacy concerns that can enable the broader sharing of clinical trial datasets. However, a key question is whether the reproducibility of the generated data is adequate to draw reliable conclusions. Methods: We synthesized datasets from five pragmatic breast cancer clinical trials performed by the REaCT group (https://react.ohri.ca/). A sequential synthesis method, a type of machine learning was performed. The published analysis of each trial was repeated on each synthetic dataset to evaluate reproducibility. We evaluated reproducibility on three criteria: (a) decision agreement: the direction and statistical significance of the primary endpoint effect estimates are the same as the real data, (b) estimate agreement: the parameter estimates from the synthetic data are within the 95% confidence interval of the real data, and (c) the confidence interval overlap between real and synthetic parameters is above 50%. In addition, we evaluated privacy using a membership disclosure metric. This evaluates the ability of an adversary to determine that a target individual was in the original dataset using the synthetic data, computed as an F1 classification accuracy score. Results: Our results show that decision and estimate agreements held true across all five trials, and the confidence interval overlap was high. The risks of membership disclosure are all below the established 0.2 threshold. Conclusions: In this study, we were able to successfully generate synthetic datasets that accurately replicated original data from 5 oncology trials and yielded the same results as in the original published studies, with a very low risk of membership disclosure. With proper modeling techniques, synthetic datasets can play a key role in data democratization and the reuse of oncology clinical trials.[Table: see text]

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Can synthetic data accurately mimic oncology clinical trials?

Abstract

Talk to us

Similar Papers

More From: Journal of Clinical Oncology

Lead the way for us

Journal: Journal of Clinical Oncology	Publication Date: Jun 1, 2023
Citations: 2

Similar Papers

Alliance A041702: A Randomized Phase III Study of Ibrutinib Plus Obinutuzumab Versus Ibrutinib Plus Venetoclax and Obinutuzumab in Untreated Older Patients (≥ 70 Years of Age) with Chronic Lymphocytic Leukemia (CLL)
Jennifer Woyach ... John C Byrd
Blood | VOL. 134
Jennifer Woyach, et. al.Jennifer Woyach ... John C Byrd
13 Nov 2019
Blood | VOL. 134

Evaluating the Utility and Privacy of Synthetic Breast Cancer Clinical Trial Data Sets.
Samer El Kababji ... Ana-Alicia Beltran-Bless
JCO clinical cancer informatics | VOL. 7
Samer El Kababji, et. al.Samer El Kababji ... Ana-Alicia Beltran-Bless
01 Sep 2023
JCO clinical cancer informatics | VOL. 7

Diversity and Inclusion in Pancreatic Cancer Clinical Trials
Kelly M Herremans ... Robert A Winn
Gastroenterology | VOL. 161
Kelly M Herremans, et. al.Kelly M Herremans ... Robert A Winn
17 Aug 2021
Gastroenterology | VOL. 161

An evaluation of the replicability of analyses using synthetic health data
Khaled El Emam ... Alaa El-Hussuna
Scientific Reports | VOL. 14
Khaled El Emam, et. al.Khaled El Emam ... Alaa El-Hussuna
24 Mar 2024
Scientific Reports | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Can synthetic data accurately mimic oncology clinical trials?

Abstract

Talk to us

Similar Papers

More From: Journal of Clinical Oncology