Abstract

Machine learning algorithms have been intensively applied to perform load forecasting to obtain better accuracies as compared to traditional statistical methods. However, with the huge increase in data size, sophisticated models have to be created which require big data platforms. Optimal and effective use of the available computational resources can be attained by maximizing the effective utilization of the cluster nodes. Parallel computing is demanded to allow for optimal resource utilization in dealing with smart grid big data. In this paper, a master-slave parallel computing paradigm is utilized and experimented with for load forecasting in a multi-AMI environment. The paper proposes a concurrent job scheduling algorithm in a multi-energy data source environment using Apache Spark. An efficient resource utilization strategy is proposed for submitting multiple Spark jobs to reduce job completion time. The optimal value of clustering is used in this paper to cluster the data into groups to be able to reduce the computational time additionally. Multiple tree-based machine learning algorithms are tested with parallel computation to evaluate the performance with tunable parameters on a real-world dataset. One thousand distribution transformers’ real data from Spain for three years are used to demonstrate the performance of the proposed methodology with a trade-off between accuracy and processing time.

Highlights

  • With the development of the smart infrastructure in the electrical grids, the data collected from various units and locations over time have begun to receive the attention of grid operators and research centers

  • The proposed approach was implemented on Apache spark to deal with the challenges associated with computation time while handling the big data and to submit jobs using an optimized methodology in a parallel manner

  • A large number of DTs training procedures were performed with reduced run-times which allow handling big data that is too large to be stored

Read more

Summary

INTRODUCTION

With the development of the smart infrastructure in the electrical grids, the data collected from various units and locations over time have begun to receive the attention of grid operators and research centers. Splitting a deluge of data into multiple datasets to perform training with the ML models has gained significant improvement in the learning process in terms of the big data context. A novel scheduling technique with the help of the Apache spark platform is proposed to short-long term forecast the load of all the one thousand transformers simultaneously. The proposed method performs load forecasting by submitting multiple jobs concurrently on the data sets utilizing the cluster resources optimally. The main contributions of this paper can be summarized as follows: 1) Proposing an optimal scheduling algorithm to perform load forecasting with parallel and distributed execution in a multi-AMI environment on the smart grid big data. VOLUME 9, 2021 scheduling algorithm proposed to perform load forecasting on multiple datasets utilizing apache spark.

RELATED WORK
OPTIMAL SCHEDULING ALGORITHM
CONSIDERING COMMUNICATION COSTS
OBJECTIVE FUNCTION
PERFORMANCE EVALUATION
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.