Concept and benchmark results for Big Data energy forecasting based on Apache Spark

Jorge Ángel González Ordiano,Nicolas Renkamp,Ralf Mikut,Clemens Düpmeier,Eric Braun,Simon Waczowicz,Nicole Ludwig,Nico Peter,Veit Hagenmeyer,Andreas Bartschat

doi:10.1186/s40537-018-0119-6

Jorge Ángel González Ordiano, Nicolas Renkamp + Show 8 more

Open Access

https://doi.org/10.1186/s40537-018-0119-6

Copy DOI

Journal: Journal of Big Data	Publication Date: Mar 6, 2018
Citations: 7	License type: open-access

Affiliation: Karlsruhe Institute of Technology

Abstract

The present article describes a concept for the creation and application of energy forecasting models in a distributed environment. Additionally, a benchmark comparing the time required for the training and application of data-driven forecasting models on a single computer and a computing cluster is presented. This comparison is based on a simulated dataset and both R and Apache Spark are used. Furthermore, the obtained results show certain points in which the utilization of distributed computing based on Spark may be advantageous.

Highlights

The transformation of the current energy grid into a Smart Grid [1] is an ongoing challenge in the pursuit of an environmentally-friendly energy supply
The benchmark conducted in the present paper has two main goals: (i) to assess the necessary time to obtain data-driven forecasting models on a distributed environment and (ii) to determine the point at which a Big Data computing framework based on Spark becomes necessary
The former is shown by the fact that Spark on the cluster has–—for the conducted benchmark—the lowest computation times for training and evaluating a complex datadriven model, i.e. a random forest; the latter is shown by Spark on the computing cluster outpacing both single computer approaches independently of the utilized technique once a data amount threshold is surpassed

Summary

Introduction

The transformation of the current energy grid into a Smart Grid [1] is an ongoing challenge in the pursuit of an environmentally-friendly energy supply. Methods (benchmark) The benchmark conducted in the present paper has two main goals: (i) to assess the necessary time to obtain data-driven forecasting models on a distributed environment and (ii) to determine the point at which a Big Data computing framework based on Spark becomes necessary. To achieve these goals, a test scenario is conducted in which the times needed for training and evaluating data-driven forecasting models on a single computer and in a distributed environment are calculated and compared. The utilized versions of R and Spark are 3.3.3 and 2.1 respectively

Results and discussion

Evaluation Random forest MLR

Evaluation

Conclusion and outlook

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Concept and benchmark results for Big Data energy forecasting based on Apache Spark

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data

Lead the way for us

Similar Papers

On the performance of forecasting models in the presence of input uncertainty
Hossein Sangrody ... Ahmad Shokrollahi
-
Hossein Sangrody, et. al.Hossein Sangrody ... Ahmad Shokrollahi
01 Sep 2017
01 Sep 2017

Apache Spark usage and deployment models for scientific computing
Diogo Castro ... Piotr Mrowczynski
EPJ web of conferences | VOL. 214
Diogo Castro, et. al.Diogo Castro ... Piotr Mrowczynski
01 Jan 2019
EPJ web of conferences | VOL. 214

Scalable Generalized Multitarget Linear Regression With Output Dependence Estimation
Julio Camejo Corona ... Carlos Morell
-
Julio Camejo Corona, et. al.Julio Camejo Corona ... Carlos Morell
01 Jan 2020
01 Jan 2020

Reviewing the security surveillance of AMI using big data analytics
Sheeraz Niaz Lighari ... Dil Muhammad Akbar Hussain
-
Sheeraz Niaz Lighari, et. al.Sheeraz Niaz Lighari ... Dil Muhammad Akbar Hussain
01 Nov 2017
01 Nov 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Concept and benchmark results for Big Data energy forecasting based on Apache Spark

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data