Optimizing the synthesis of clinical trial data using sequential trees.

Khaled El Emam,Lucy Mosquera,Chaoyi Zheng

doi:10.1093/jamia/ocaa249

Abstract

ObjectiveWith the growing demand for sharing clinical trial data, scalable methods to enable privacy protective access to high-utility data are needed. Data synthesis is one such method. Sequential trees are commonly used to synthesize health data. It is hypothesized that the utility of the generated data is dependent on the variable order. No assessments of the impact of variable order on synthesized clinical trial data have been performed thus far. Through simulation, we aim to evaluate the variability in the utility of synthetic clinical trial data as variable order is randomly shuffled and implement an optimization algorithm to find a good order if variability is too high.Materials and MethodsSix oncology clinical trial datasets were evaluated in a simulation. Three utility metrics were computed comparing real and synthetic data: univariate similarity, similarity in multivariate prediction accuracy, and a distinguishability metric. Particle swarm was implemented to optimize variable order, and was compared with a curriculum learning approach to ordering variables.ResultsAs the number of variables in a clinical trial dataset increases, there is a pattern of a marked increase in variability of data utility with order. Particle swarm with a distinguishability hinge loss ensured adequate utility across all 6 datasets. The hinge threshold was selected to avoid overfitting which can create a privacy problem. This was superior to curriculum learning in terms of utility.ConclusionsThe optimization approach presented in this study gives a reliable way to synthesize high-utility clinical trial datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of the American Medical Informatics Association : JAMIA	Publication Date: Nov 13, 2020
Citations: 30	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

Optimizing the synthesis of clinical trial data using sequential trees.

Abstract

Talk to us

Similar Papers

More From: Journal of the American Medical Informatics Association : JAMIA

Lead the way for us

Similar Papers

Evaluating the Utility and Privacy of Synthetic Breast Cancer Clinical Trial Data Sets.
Samer El Kababji ... Ana-Alicia Beltran-Bless
JCO clinical cancer informatics | VOL. 7
Samer El Kababji, et. al.Samer El Kababji ... Ana-Alicia Beltran-Bless
01 Sep 2023
JCO clinical cancer informatics | VOL. 7

Comparison of Synthetic Data Generation Techniques for Control Group Survival Data in Oncology Clinical Trials: Simulation Study.
Ippei Akiya ... Keiichi Yamamoto
JMIR medical informatics | VOL. 12
Ippei Akiya, et. al.Ippei Akiya ... Keiichi Yamamoto
18 Jun 2024
JMIR medical informatics | VOL. 12

Protecting patient privacy when sharing patient-level data from clinical trials.
Katherine Tucker ... Mark J Nixon
BMC Medical Research Methodology | VOL. Suppl 16 1
Katherine Tucker, et. al.Katherine Tucker ... Mark J Nixon
01 Jul 2016
BMC Medical Research Methodology | VOL. Suppl 16 1

Smoking, The Missing Drug Interaction in Clinical Trials: Ignoring the Obvious
Ellen R Gritz ... Carolyn Dresler
Cancer Epidemiology, Biomarkers & Prevention | VOL. 14
Ellen R Gritz, et. al.Ellen R Gritz ... Carolyn Dresler
01 Oct 2005
Cancer Epidemiology, Biomarkers & Prevention | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimizing the synthesis of clinical trial data using sequential trees.

Abstract

Talk to us

Similar Papers

More From: Journal of the American Medical Informatics Association : JAMIA