Mimicking Clinical Trials with Synthetic Acute Myeloid Leukemia Patients Using Generative Artificial Intelligence

Jan-Niklas Eckardt,Waldemar Hahn,Christoph Röllig,Sebastian Stasik,Uwe Platzbecker,Carsten Müller-Tidow,Hubert Serve,Claudia D Baldus,Christoph Schliemann,Kerstin Schäfer-Eckart,Maher Hanoun,Martin Kaufmann,Andreas Burchert,Christian Thiede,Johannes Schetelig,Martin Bornhäuser,Markus Wolfien,Jan Moritz Middeke

doi:10.1182/blood-2023-179817

Abstract

Data sharing is often hindered by concerns of patient privacy, regulatory aspects, and proprietary interests thereby impeding scientific progress and establishing a gatekeeping mechanism in clinical medicine since obtaining large data sets is costly and time-consuming. We employed two different generative artificial intelligence (AI) technologies: CTAB-GAN+ and Normalizing Flows (NFlow) to synthesize clinical trial data based on pooled patient data from four previous multicenter clinical trials of the German Study Alliance Leukemia (AML96, AML2003, AML60+, SORAML) that enrolled adult patients (n=1606) with acute myeloid leukemia (AML) who received intensive induction therapy. As a generative adversarial network (GAN), CTAB-GAN+ consists of two adversarial networks: a generator producing synthetic samples from random noise and a discriminator aiming to distinguish between real and synthetic samples. The model converges as the discriminator can no longer reliably differentiate between real or synthetic data. Contrastingly, NFlow consists of a sequence of invertible transformations (flows) starting from a simple base distribution and gradually adding complexity to better mirror the training data. Both models were trained on tabular data including demographic, laboratory, molecular genetic and cytogenetic patient variables. Detection of molecular alterations in the original cohort was performed via next-generation sequencing (NGS) using the TruSight Myeloid Sequencing Panel (Illumina, San Diego, CA, USA) with a 5% variant-allele frequency (VAF) mutation calling cut-off. For cytogenetics, standard techniques for chromosome banding and fluorescence-in-situ-hybridization (FISH) were used. Hyperparameter tuning of generative models was conducted using the Optuna Framework. For each model, we used a total of 70 optimization trials to optimize a custom score inspired by TabSynDex which assesses both the resemblance of the synthetic data to real training data and its utility. Pairwise analyses were conducted between the original and both synthetic data sets, respectively. All tests were carried out as two-sided tests using a significance level α of 0.05. Table 1 summarizes baseline patient characteristics and outcome for both synthetic cohorts compared to the original cohort. Firstly, we found both models to adequately represent patient features, albeit that some individual variables showed a statistically significant deviation from the original cohort. It is important to note that for such a large sample size (n=1606 for each cohort), even miniscule differences can be rendered statistically significant notwithstanding any meaningful clinical difference. Interestingly, variables that deviated from the original distribution were different for both models indicating model architecture to play a vital role in sample representation: While CTAB-GAN+ showed significant deviations for both age and sex, NFlow showed significant deviations for AML status. Complete remission rate was similar between original (70.7%, odds ratio [OR]: 2.41) and CTAB-GAN+ (73.7%, OR: 2.81, p=0.059) and NFlow (69.1%, OR: 2.24, p=0.356). For event-free survival (EFS), which was not included as a target in hyperparameter tuning, both networks deviated significantly from the original cohort (original: median 7.2 months, HR: 1.36; CTAB-GAN+: median 12.8 months, HR 0.74, p&lt;0.001; NFlow: median 9.0 months, HR: 0.87, p=0.001). Overall survival (OS) was well represented by NFlow compared to the original cohort, while CTAB-GAN+ showed a significant deviation (original: median 17.5 months, HR: 1.14; CTAB-GAN+: median 19.5 months, HR 0.88, p&lt;0.001; NFlow: median 16.2 months, HR: 1.00, p=0.055). Both models showed an adequate graph representation in Kaplan-Meier analysis (Figure 1). Here, we demonstrate using two different generative AI technologies that synthetic data generation provides an attractive solution to circumvent issues in current standards of data collection and sharing. It effectively allows for bypassing logistical, organizational, and financial burdens, as well as regulatory and ethical concerns. Ultimately, this enables explorative research inquiries into previously inaccessible data sets and offers the prospect of fully synthetic control arms in prospective clinical trials.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Mimicking Clinical Trials with Synthetic Acute Myeloid Leukemia Patients Using Generative Artificial Intelligence

Abstract

Talk to us

Similar Papers

More From: Blood

Lead the way for us

Similar Papers

EZH2 Mutations and Impact on Clinical Outcome Analyzed in 1604 Patients with Acute Myeloid Leukemia
Sebastian Stasik ...
Blood | VOL. 132
Sebastian Stasik, et. al.Sebastian Stasik ...
29 Nov 2018
EZH2 Mutations and Impact on Clinical Outcome Analyzed in 1604 Patients with Acute Myeloid Leukemia
Sebastian Stasik ...

Clonal Hematopoiesis Is Associated with Increased Risk for Therapy-Related Myeloid Neoplasms in Chronic Lymphocytic Leukemia Patients Treated with Chemo(immuno)Therapy
Maria Teresa Voso ...
Blood | VOL. 136
Maria Teresa Voso, et. al.Maria Teresa Voso ...
05 Nov 2020
Blood | VOL. 136

Results of the “Evaluation of NGS in AML-Diagnostics (ELAN)” Study – an Inter-Laboratory Comparison Performed in 10 European Laboratories
Christian Thiede ...
Blood | VOL. 124
Christian Thiede, et. al.Christian Thiede ...
06 Dec 2014
Blood | VOL. 124

Outcomes of TP53‐mutated AML with evolving frontline therapies: Impact of allogeneic stem cell transplantation on survival
Talha Badar ... Yuanhang Liu
American Journal of Hematology | VOL. 97
Talha Badar, et. al.Talha Badar ... Yuanhang Liu
30 Mar 2022
American Journal of Hematology | VOL. 97

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mimicking Clinical Trials with Synthetic Acute Myeloid Leukemia Patients Using Generative Artificial Intelligence

Abstract

Talk to us

Similar Papers

More From: Blood