The contemporary landscape of electricity marketing data utilization is characterized by increased openness, heightened data circulation, and more intricate interaction contexts. Throughout the entire lifecycle of data, the persistent threat of leakage is ever-present. In this study, we introduce a novel electricity data anonymization model, termed EPA-GAN, which relies on table generation. In comparison to existing methodologies, our model extends the foundation of generative adversarial networks by incorporating feature encoders and feedback mechanisms. This adaptation enables the generation of anonymized data with heightened practicality and similarity to the original data, specifically tailored for mixed data types, thereby achieving a deliberate decoupling from the source data. Our proposed approach initiates by parsing the original JSON file, encoding it based on variable types and features using distinct feature encoders. Subsequently, a generative adversarial network, enhanced with information, downstream, generator losses, and the Was + GP modification, is employed to generate anonymized data. The introduction of random noise fortifies privacy protection during the data generation process. Experimental validation attests to a conspicuous reduction in both machine learning utility and statistical dissimilarity between the data synthesized by our proposed anonymization model and the original dataset. This substantiates the model’s efficacy in replacing the original data for mining analysis and data sharing, thereby effectively safeguarding the privacy of the source data.
Read full abstract