Abstract
Differential privacy (DP) generative adversarial networks (GANs) can generate protected synthetic samples from downstream analysis. However, training on unbalanced datasets can bias the network towards majority classes, leading minority undertrained. Meanwhile, gradient perturbation in DP has no guarantee for perfect protection on the data point. Due to noisy gradients, the training can converge to a suboptimum, or offer no protection when encountering a noise equilibrium. To address the above issues, this work proposes a balanced Two-Stage DP-GAN (TS-DPGAN) framework. In Stage I, we use a data balancing algorithm with sampling techniques to reduce the bias and learn features from previously undertrained classes. Compared to a sampling strategy with fixed reference, a reference interval is introduced to reduce duplication in oversampling and information loss in undersampling. Then, the framework directly perturbs the balanced samples rather than gradients to achieve data-wise DP and improve sample diversity. Since data balancing uniformizes distribution, a feature-holding strategy was used in Stage II to keep important features from Stage I while restoring the original data distribution. Simulations show our framework outperforms other when compared with the SOTA algorithms on image quality, distribution maintaining, and convergence.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have