IGAMT: Privacy-Preserving Electronic Health Record Synthesization with Heterogeneity and Irregularity

Wenjie Wang,Li Xiong,Yi-An Ko,Lance Waller,Jian Lou,Pengfei Tang,Yuanming Shao

doi:10.1609/aaai.v38i14.29491

Abstract

Integrating electronic health records (EHR) into machine learning-driven clinical research and hospital applications is important, as it harnesses extensive and high-quality patient data to enhance outcome predictions and treatment personalization. Nonetheless, due to privacy and security concerns, the secondary purpose of EHR data is consistently governed and regulated, primarily for research intentions, thereby constraining researchers' access to EHR data. Generating synthetic EHR data with deep learning methods is a viable and promising approach to mitigate privacy concerns, offering not only a supplementary resource for downstream applications but also sidestepping the confidentiality risks associated with real patient data. While prior efforts have concentrated on EHR data synthesis, significant challenges persist in the domain of generating synthetic EHR data: balancing the heterogeneity of real EHR including temporal and non-temporal features, addressing the missing values and irregular measures, and ensuring the privacy of the real data used for model training. Existing works in this domain only focused on solving one or two aforementioned challenges. In this work, we propose IGAMT, an innovative framework to generate privacy-preserved synthetic EHR data that not only maintain high quality with heterogeneous features, missing values, and irregular measures but also balances the privacy-utility trade-off. Extensive experiments prove that IGAMT significantly outperforms baseline architectures in terms of visual resemblance and comparable performance in downstream applications. Ablation case studies also prove the effectiveness of the techniques applied in IGAMT.

Full Text