Electric load forecasting is crucial in the planning and operating electric power companies. It has evolved from statistical methods to artificial intelligence-based techniques that use machine learning models. In this study, we investigate short-term load forecasting (STLF) for large-scale electricity usage datasets. We propose a new prediction model for STLF that combines data clustering and dimensionality reduction schemes to handle large-scale electricity usage data effectively. Here, we adapt k-means clustering for data clustering, kernel principal component analysis (kernel PCA), universal manifold approximation and projection (UMAP), and t-stochastic nearest neighbor (t-SNE) for dimensionality reduction. To verify the effectiveness of the proposed model, we extensively apply it to neural network-based models. We compare and analyze the performance of the proposed model with the comparisons using actual electricity usage data for 4710 households. Experimental results demonstrate that data clustering with dimensionality reduction can improve the performance of baseline models. As a result, the prediction accuracy of the proposed method outperforms those of the existing methods by 1.01–1.76 times for summer data and by 1.03–1.36 times for winter data in terms of mean absolute percentage error (MAPE).
Read full abstract