The Building Data Genome Project 2, energy meter data from the ASHRAE Great Energy Predictor III competition

Clayton Miller,Bianca Picchetti,Brodie W Hobson,Forrest Meggers,Anjukan Kathirgamanathan,June Young Park,Zixiao Shi,Paul Raftery,Zoltan Nagy,Pandarasamy Arjunan

doi:10.1038/s41597-020-00712-x

Abstract

This paper describes an open data set of 3,053 energy meters from 1,636 non-residential buildings with a range of two full years (2016 and 2017) at an hourly frequency (17,544 measurements per meter resulting in approximately 53.6 million measurements). These meters were collected from 19 sites across North America and Europe, with one or more meters per building measuring whole building electrical, heating and cooling water, steam, and solar energy as well as water and irrigation meters. Part of these data was used in the Great Energy Predictor III (GEPIII) competition hosted by the American Society of Heating, Refrigeration, and Air-Conditioning Engineers (ASHRAE) in October-December 2019. GEPIII was a machine learning competition for long-term prediction with an application to measurement and verification. This paper describes the process of data collection, cleaning, and convergence of time-series meter data, the meta-data about the buildings, and complementary weather data. This data set can be used for further prediction benchmarking and prototyping as well as anomaly detection, energy analysis, and building type classification.

Highlights

Background & SummaryBuilding performance analytics and commissioning processes have significant opportunities to save energy, reduce carbon emissions of buildings, and reduce the operating costs of building owners world-wide[1]
This paper focuses on the development of a data set that builds upon these motivations
Each of the buildings has metadata such as area, weather, and primary use type collated. This data set can be used to benchmark various statistical learning algorithms and other data science techniques. It can be used merely as a teaching or learning tool to practice dealing with measured performance data from large numbers of non-residential buildings

Summary

Background & Summary

Building performance analytics and commissioning processes have significant opportunities to save energy, reduce carbon emissions of buildings, and reduce the operating costs of building owners world-wide[1]. Despite the significant research body of knowledge developed, there is still a lack of understanding of how to scale techniques across the highly heterogeneous building stock[2] When it comes to machine learning innovation in academia, one of the most significant assets can be large and open data sets that the community can use to prototype and quantitatively compare techniques in ways that show better value in terms of speed, accuracy, or implementation ease. This statement is supported by the significant efforts in time-series data classification[3], image recognition[4], and the larger machine learning community in general, both hardware and software[5].

C 4 A 4 A 6 A 2 A 2 A 5B 6 A 5 A 6 A 5 A 6 A 4 A 3 C 4 A

Methods

Findings

Code availability