Abstract

Medical data often contain sensitive personal information about individuals, posing significant limitations to it being shared or released for downstream learning and inferential tasks. We use normalizing flows (NF), a family of deep generative models, to estimate the probability density of a dataset with differential privacy (DP) guarantees, from which privacy-preserving synthetic data are generated and released. We apply the technique to an electronic health records dataset containing patients with pulmonary hypertension. We assess the learning and inferential utility of synthetic data by comparing the accuracy of hypertension predictions and the variational posterior distribution of the parameters in a physics-based model. The results suggest that synthetic data generated via NF with DP can yield good utility at a reasonable privacy cost. Our study provides evidence and adds to the growing literature on the feasibility of generating synthetic medical data for sharing or obtaining inferences from medical data using deep generate models with formal privacy guarantees.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call