Privacy and distribution preserving generative adversarial networks with sample balancing

Haoran Sun,Jinchuan Tang,Shuping Dang,Gaojie Chen

doi:10.1016/j.eswa.2024.125181

Haoran Sun, Jinchuan Tang + Show 2 more

https://doi.org/10.1016/j.eswa.2024.125181

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Differential privacy (DP) generative adversarial networks (GANs) can generate protected synthetic samples from downstream analysis. However, training on unbalanced datasets can bias the network towards majority classes, leading minority undertrained. Meanwhile, gradient perturbation in DP has no guarantee for perfect protection on the data point. Due to noisy gradients, the training can converge to a suboptimum, or offer no protection when encountering a noise equilibrium. To address the above issues, this work proposes a balanced Two-Stage DP-GAN (TS-DPGAN) framework. In Stage I, we use a data balancing algorithm with sampling techniques to reduce the bias and learn features from previously undertrained classes. Compared to a sampling strategy with fixed reference, a reference interval is introduced to reduce duplication in oversampling and information loss in undersampling. Then, the framework directly perturbs the balanced samples rather than gradients to achieve data-wise DP and improve sample diversity. Since data balancing uniformizes distribution, a feature-holding strategy was used in Stage II to keep important features from Stage I while restoring the original data distribution. Simulations show our framework outperforms other when compared with the SOTA algorithms on image quality, distribution maintaining, and convergence.

Full Text