Generating synthetic personal health data using conditional generative adversarial networks combining with differential privacy

Chang Sun,Johan Van Soest,Michel Dumontier

doi:10.1016/j.jbi.2023.104404

Abstract

A large amount of personal health data that is highly valuable to the scientific community is still not accessible or requires a lengthy request process due to privacy concerns and legal restrictions. As a solution, synthetic data has been studied and proposed to be a promising alternative to this issue. However, generating realistic and privacy-preserving synthetic personal health data retains challenges such as simulating the characteristics of the patients’ data that are in the minority classes, capturing the relations among variables in imbalanced data and transferring them to the synthetic data, and preserving individual patients’ privacy. In this paper, we propose a differentially private conditional Generative Adversarial Network model (DP-CGANS) consisting of data transformation, sampling, conditioning, and network training to generate realistic and privacy-preserving personal data. Our model distinguishes categorical and continuous variables and transforms them into latent space separately for better training performance. We tackle the unique challenges of generating synthetic patient data due to the special data characteristics of personal health data. For example, patients with a certain disease are typically the minority in the dataset and the relations among variables are crucial to be observed. Our model is structured with a conditional vector as an additional input to present the minority class in the imbalanced data and maximally capture the dependency between variables. Moreover, we inject statistical noise into the gradients in the networking training process of DP-CGANS to provide a differential privacy guarantee. We extensively evaluate our model with state-of-the-art generative models on personal socio-economic datasets and real-world personal health datasets in terms of statistical similarity, machine learning performance, and privacy measurement. We demonstrate that our model outperforms other comparable models, especially in capturing the dependence between variables. Finally, we present the balance between data utility and privacy in synthetic data generation considering the different data structures and characteristics of real-world personal health data such as imbalanced classes, abnormal distributions, and data sparsity.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Generating synthetic personal health data using conditional generative adversarial networks combining with differential privacy

Abstract

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics

Lead the way for us

Journal: Journal of Biomedical Informatics	Publication Date: Jun 1, 2023
Citations: 12

Similar Papers

On the Fidelity-Privacy Tradeoff of Synthetic Cancer Registry Data.
Philipp Röchner
Studies in health technology and informatics | VOL. 316
Philipp RöchnerPhilipp Röchner
22 Aug 2024
Studies in health technology and informatics | VOL. 316

Synthetic Tabular Data Based on Generative Adversarial Networks in Health Care: Generation and Validation Using the Divide-and-Conquer Strategy.
Ha Ye Jin Kang ... Minsam Ko
JMIR medical informatics | VOL. 11
Ha Ye Jin Kang, et. al.Ha Ye Jin Kang ... Minsam Ko
24 Nov 2023
JMIR medical informatics | VOL. 11

Prediction for underground seismic intensity measures using conditional generative adversarial networks
Shuqian Duan ... Jiecheng Xiong
Soil Dynamics and Earthquake Engineering | VOL. 180
Shuqian Duan, et. al.Shuqian Duan ... Jiecheng Xiong
26 Mar 2024
Soil Dynamics and Earthquake Engineering | VOL. 180

Conditional generative adversarial network model for simulating intensity measures of aftershocks
Yinjun Ding ... Jiaxu Shen
Soil Dynamics and Earthquake Engineering | VOL. 139
Yinjun Ding, et. al.Yinjun Ding ... Jiaxu Shen
06 Sep 2020
Soil Dynamics and Earthquake Engineering | VOL. 139

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Generating synthetic personal health data using conditional generative adversarial networks combining with differential privacy

Abstract

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics