Abstract

The present article describes a concept for the creation and application of energy forecasting models in a distributed environment. Additionally, a benchmark comparing the time required for the training and application of data-driven forecasting models on a single computer and a computing cluster is presented. This comparison is based on a simulated dataset and both R and Apache Spark are used. Furthermore, the obtained results show certain points in which the utilization of distributed computing based on Spark may be advantageous.

Highlights

  • The transformation of the current energy grid into a Smart Grid [1] is an ongoing challenge in the pursuit of an environmentally-friendly energy supply

  • The benchmark conducted in the present paper has two main goals: (i) to assess the necessary time to obtain data-driven forecasting models on a distributed environment and (ii) to determine the point at which a Big Data computing framework based on Spark becomes necessary

  • The former is shown by the fact that Spark on the cluster has–—for the conducted benchmark—the lowest computation times for training and evaluating a complex datadriven model, i.e. a random forest; the latter is shown by Spark on the computing cluster outpacing both single computer approaches independently of the utilized technique once a data amount threshold is surpassed

Read more

Summary

Introduction

The transformation of the current energy grid into a Smart Grid [1] is an ongoing challenge in the pursuit of an environmentally-friendly energy supply. Methods (benchmark) The benchmark conducted in the present paper has two main goals: (i) to assess the necessary time to obtain data-driven forecasting models on a distributed environment and (ii) to determine the point at which a Big Data computing framework based on Spark becomes necessary. To achieve these goals, a test scenario is conducted in which the times needed for training and evaluating data-driven forecasting models on a single computer and in a distributed environment are calculated and compared. The utilized versions of R and Spark are 3.3.3 and 2.1 respectively

Results and discussion
Evaluation Random forest MLR
Evaluation
Conclusion and outlook
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call