Evaluating Synthetic Data Augmentation to Correct for Data Imbalance in Realistic Clinical Prediction Settings.

Nina Wahler,Bayrem Kaabachi,Bogdan Kulynych,Jérémie Despraz,Christian Simon,Jean Louis Raisaro

doi:10.3233/shti240563

Abstract

Predictive modeling holds a large potential in clinical decision-making, yet its effectiveness can be hindered by inherent data imbalances in clinical datasets. This study investigates the utility of synthetic data for improving the performance of predictive modeling on realistic small imbalanced clinical datasets. We compared various synthetic data generation methods including Generative Adversarial Networks, Normalizing Flows, and Variational Autoencoders to the standard baselines for correcting for class underrepresentation on four clinical datasets. Although results show improvement in F1 scores in some cases, even over multiple repetitions, we do not obtain statistically significant evidence that synthetic data generation outperforms standard baselines for correcting for class imbalance. This study challenges common beliefs about the efficacy of synthetic data for data augmentation and highlights the importance of evaluating new complex methods against simple baselines.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Evaluating Synthetic Data Augmentation to Correct for Data Imbalance in Realistic Clinical Prediction Settings.

Abstract

Talk to us

Similar Papers

More From: Studies in health technology and informatics

Lead the way for us

Journal: Studies in health technology and informatics	Publication Date: Aug 22, 2024
License type: cc-by-nc

Similar Papers

Evaluating the Utility and Privacy of Synthetic Breast Cancer Clinical Trial Data Sets.
Samer El Kababji ... Ana-Alicia Beltran-Bless
JCO clinical cancer informatics | VOL. 7
Samer El Kababji, et. al.Samer El Kababji ... Ana-Alicia Beltran-Bless
01 Sep 2023
JCO clinical cancer informatics | VOL. 7

Generating synthetic personal health data using conditional generative adversarial networks combining with differential privacy
Chang Sun ... Michel Dumontier
Journal of Biomedical Informatics | VOL. 143
Chang Sun, et. al.Chang Sun ... Michel Dumontier
01 Jun 2023
Journal of Biomedical Informatics | VOL. 143

Abstract 4927: Combining single-cell ATAC and RNA sequencing for supervised cell annotation
Jaidip Gill ... Natasha Markuzon
Cancer Research | VOL. 84
Jaidip Gill, et. al.Jaidip Gill ... Natasha Markuzon
22 Mar 2024
Abstract 4927: Combining single-cell ATAC and RNA sequencing for supervised cell annotation
Jaidip Gill ... Natasha Markuzon

EEG data augmentation: towards class imbalance problem in sleep staging tasks
Jiahao Fan ... Xinyu Jiang
Journal of Neural Engineering | VOL. 17
Jiahao Fan, et. al.Jiahao Fan ... Xinyu Jiang
01 Oct 2020
Journal of Neural Engineering | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluating Synthetic Data Augmentation to Correct for Data Imbalance in Realistic Clinical Prediction Settings.

Abstract

Talk to us

Similar Papers

More From: Studies in health technology and informatics