Abstract
Medical data often contain sensitive personal information about individuals, posing significant limitations to it being shared or released for downstream learning and inferential tasks. We use normalizing flows (NF), a family of deep generative models, to estimate the probability density of a dataset with differential privacy (DP) guarantees, from which privacy-preserving synthetic data are generated and released. We apply the technique to an electronic health records dataset containing patients with pulmonary hypertension. We assess the learning and inferential utility of synthetic data by comparing the accuracy of hypertension predictions and the variational posterior distribution of the parameters in a physics-based model. The results suggest that synthetic data generated via NF with DP can yield good utility at a reasonable privacy cost. Our study provides evidence and adds to the growing literature on the feasibility of generating synthetic medical data for sharing or obtaining inferences from medical data using deep generate models with formal privacy guarantees.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.