Residential energy consumption data and related sociodemographic information are critical for energy demand management, including providing personalized services, ensuring energy supply, and designing demand response programs. However, it is often difficult to collect sufficient data to build machine learning models, primarily due to cost, technical barriers, and privacy. Synthetic data generation becomes a feasible solution to address data availability issues, while most existing work generates data without considering the balance between usability and privacy. In this paper, we first propose a data generation model based on the Wasserstein Deep Convolutional Generative Adversarial Network (WDCGAN), which is capable of synthesizing fine-grained energy consumption time series and corresponding sociodemographic information. The WDCGAN model can generate realistic data by balancing data usability and privacy level by setting a hyperparameter during training. Next, we take the classification of sociodemographic information as an application example and train four classical classification models with the generated datasets, including CNN, LSTM, SVM, and LightGBM. We evaluate the proposed data generator using Irish data, and the results show that the proposed WDCGAN model can generate realistic load profiles with satisfactory similarity in terms of data distribution, patterns, and performance. The classification results validate the usability of the generated data for real-world machine learning applications with privacy guarantee, e.g., most of the differences in classification accuracy and F1 scores are less than 8% between using real and synthesized data.
Read full abstract