Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark

Ameema Zainab,Othmane Bouhali,Haitham Abu-Rub,Shady S Refaat,Ali Ghrayeb

doi:10.1109/access.2021.3072609

Ameema Zainab, Othmane Bouhali + Show 3 more

Open Access

https://doi.org/10.1109/access.2021.3072609

Copy DOI

Abstract

Machine learning algorithms have been intensively applied to perform load forecasting to obtain better accuracies as compared to traditional statistical methods. However, with the huge increase in data size, sophisticated models have to be created which require big data platforms. Optimal and effective use of the available computational resources can be attained by maximizing the effective utilization of the cluster nodes. Parallel computing is demanded to allow for optimal resource utilization in dealing with smart grid big data. In this paper, a master-slave parallel computing paradigm is utilized and experimented with for load forecasting in a multi-AMI environment. The paper proposes a concurrent job scheduling algorithm in a multi-energy data source environment using Apache Spark. An efficient resource utilization strategy is proposed for submitting multiple Spark jobs to reduce job completion time. The optimal value of clustering is used in this paper to cluster the data into groups to be able to reduce the computational time additionally. Multiple tree-based machine learning algorithms are tested with parallel computation to evaluate the performance with tunable parameters on a real-world dataset. One thousand distribution transformers’ real data from Spain for three years are used to demonstrate the performance of the proposed methodology with a trade-off between accuracy and processing time.

Highlights

With the development of the smart infrastructure in the electrical grids, the data collected from various units and locations over time have begun to receive the attention of grid operators and research centers
The proposed approach was implemented on Apache spark to deal with the challenges associated with computation time while handling the big data and to submit jobs using an optimized methodology in a parallel manner
A large number of DTs training procedures were performed with reduced run-times which allow handling big data that is too large to be stored

Summary

INTRODUCTION

With the development of the smart infrastructure in the electrical grids, the data collected from various units and locations over time have begun to receive the attention of grid operators and research centers. Splitting a deluge of data into multiple datasets to perform training with the ML models has gained significant improvement in the learning process in terms of the big data context. A novel scheduling technique with the help of the Apache spark platform is proposed to short-long term forecast the load of all the one thousand transformers simultaneously. The proposed method performs load forecasting by submitting multiple jobs concurrently on the data sets utilizing the cluster resources optimally. The main contributions of this paper can be summarized as follows: 1) Proposing an optimal scheduling algorithm to perform load forecasting with parallel and distributed execution in a multi-AMI environment on the smart grid big data. VOLUME 9, 2021 scheduling algorithm proposed to perform load forecasting on multiple datasets utilizing apache spark.

RELATED WORK

OPTIMAL SCHEDULING ALGORITHM

CONSIDERING COMMUNICATION COSTS

OBJECTIVE FUNCTION

PERFORMANCE EVALUATION

Findings

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 24	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Confirming the statistically significant superiority of tree-based machine learning algorithms over their counterparts for tabular data.
Haohui Lu ... Shahadat Uddin
PLOS ONE | VOL. 19
Haohui Lu, et. al.Haohui Lu ... Shahadat Uddin
18 Apr 2024
PLOS ONE | VOL. 19

Performance Evaluation of Distributed Machine Learning for Load Forecasting in Smart Grids
Dabeeruddin Syed ... Shady S Refaat
-
Dabeeruddin Syed, et. al.Dabeeruddin Syed ... Shady S Refaat
01 Jan 2020
01 Jan 2020

Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark
Mousumi Chaudhury ... Amin Karami
Electronics | VOL. 11
Mousumi Chaudhury, et. al.Mousumi Chaudhury ... Amin Karami
17 Aug 2022
Electronics | VOL. 11

Hybrid Machine Learning-Based Intelligent Technique for Improved Big Data Analytics
Andronicus A Akinyelu
-
Andronicus A AkinyeluAndronicus A Akinyelu
01 Nov 2019
01 Nov 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access