Abstract

The pervasive and increasing deployment of smart meters allows collecting a huge amount of fine-grained energy data in different urban scenarios. The analysis of such data is challenging and opening up a variety of interesting and new research issues across energy and computer science research areas. The key role of computer scientists is providing energy researchers and practitioners with cutting-edge and scalable analytics engines to effectively support their daily research activities, hence fostering and leveraging data-driven approaches. This paper presents SPEC, a scalable and distributed engine to predict building-specific power consumption. SPEC addresses the full analytic stack and exploits a data stream approach over sliding time windows to train a prediction model tailored to each building. The model allows us to predict the upcoming power consumption at a time instant in the near future. SPEC integrates different machine learning approaches, specifically ridge regression, artificial neural networks, and random forest regression, to predict fine-grained values of power consumption, and a classification model, the random forest classifier, to forecast a coarse consumption level. SPEC exploits state-of-the-art distributed computing frameworks to address the big data challenges in harvesting energy data: the current implementation runs on Apache Spark, the most widespread high-performance data-processing platform, and can natively scale to huge datasets. As a case study, SPEC has been tested on real data of an heating distribution network and power consumption data collected in a major Italian city. Experimental results demonstrate the effectiveness of SPEC to forecast both fine-grained values and coarse levels of power consumption of buildings.

Highlights

  • In the last few years, an increasing number of smart meters has been deployed in smart city environments to monitor energy consumption in buildings

  • Among the available techniques suited to the classification problem SPEC provides the random forest classifier (RFC)

  • SPEC integrates three metrics to evaluate the quality of regression-based models and one metric for the classification-based models: (i) mean absolute percentage error (MAPE), (ii) weighted absolute percentage error (WAPE), and (iii) symmetric mean absolute percentage error (SMAPE), whose formulas are reported in the following, are the regression metrics, whereas the accuracy is used for the classification model

Read more

Summary

Introduction

In the last few years, an increasing number of smart meters has been deployed in smart city environments to monitor energy consumption in buildings. To effectively mine large collections of energy data, state-of-the-art data mining algorithms have often required crucial limitations to be addressed, such as those represented by computational resources. To this aim, scalable solutions have been devised in recent years, including wide-spread big data frameworks like Apache Hadoop [1], and Apache Spark [2]. From the computer scientist’s point of view, most of the technologies and algorithms related to big data processing and analytics have to be tailored to the specific features of the energy domain, such as heterogeneous sources and formats, variable data distributions, different abstraction levels, both fine and coarse grained, to effectively and efficiently support the knowledge extraction process. After the comparison of our approach with related works, we draw conclusions and presents future work

Distributed Frameworks
Contribution of This Work
Related Work
The SPEC Engine
Data Preprocessing
Data Stream Processing
Prediction Analysis
Prediction Validation
Experimental Results
Conclusions and Future Works
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call