ProcessGAN: Generating Privacy-Preserving Time-Aware Process Data with Conditional Generative Adversarial Nets

Keyi Li,Ivan Marsic,Sen Yang,Randall S Burd,Travis M Sullivan

doi:10.1145/3687464

Abstract

Process data constructed from event logs provides valuable insights into procedural dynamics over time. The confidential information in process data, together with the data’s intricate nature, makes the datasets not sharable and challenging to collect. Consequently, research is limited using process data and analytics in the process mining domain. In this study, we introduced a synthetic process data generation task to address the limitation of sharable process data. We introduced a generative adversarial network, called ProcessGAN, to generate process data with activity sequences and corresponding timestamps. ProcessGAN consists of a transformer-based network as the generator, and a time-aware self-attention network as the discriminator. It can generate privacy-preserving process data from random noise. ProcessGAN considers the duration of the process and time intervals between activities to generate realistic activity sequences with timestamps. We evaluated ProcessGAN on five real-world datasets, two that are public and three collected in medical domains that are private. To evaluate the synthetic data, in addition to statistical metrics, we trained a supervised model to score the synthetic processes. We also used process mining to discover workflows for synthetic medical processes and had domain experts evaluate the clinical applicability of the synthetic workflows. ProcessGAN outperformed the existing generative models in generating complex processes with valid parallel pathways. The synthetic process data generated by ProcessGAN better represented the long-range dependencies between activities, a feature relevant to complicated medical and other processes. The timestamps generated by the ProcessGAN model showed similar distributions with the authentic timestamps. In addition, we trained a transformer-based network to generate synthetic contexts (e.g., patient demographics) that were associated with the synthetic processes. The synthetic contexts generated by our model outperformed the baseline models, with the distributions similar to the authentic contexts. We conclude that ProcessGAN can generate sharable synthetic process data indistinguishable from authentic data. Our source code is available in https://github.com/raaachli/ProcessGAN .

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ProcessGAN: Generating Privacy-Preserving Time-Aware Process Data with Conditional Generative Adversarial Nets

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Knowledge Discovery from Data

Lead the way for us

Similar Papers

Grid architecture for distributed process mining

-

01 Jan 2010
01 Jan 2010

High-fidelity synthetic patient data applications and privacy considerations
Puja Myles ... Colin Mitchell
Journal of Data Protection & Privacy | VOL. 6
Puja Myles, et. al.Puja Myles ... Colin Mitchell
01 Jun 2024
Journal of Data Protection & Privacy | VOL. 6

A comprehensive benchmarking framework (CoBeFra) for conformance analysis between procedural process models and event logs in ProM
Seppe K.L.M Vanden Broucke ... Jochen De Weerdt
-
Seppe K.L.M Vanden Broucke, et. al.Seppe K.L.M Vanden Broucke ... Jochen De Weerdt
01 Apr 2013
01 Apr 2013

An Investigation to Identify Factors that Lead to Delay in Healthcare Reimbursement Process: A Brazilian case
Ricardo Gerhardt ... José Vicente Canto Dos Santos
Big Data Research | VOL. 13
Ricardo Gerhardt, et. al.Ricardo Gerhardt ... José Vicente Canto Dos Santos
02 Mar 2018
Big Data Research | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ProcessGAN: Generating Privacy-Preserving Time-Aware Process Data with Conditional Generative Adversarial Nets

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Knowledge Discovery from Data