Sensor devices are becoming omnipresent, supplying data to a wide range of applications. In the building sector, sensors along with other information sources provide the basis for smart building functionalities. Predicting energy loads and inferring occupancy status of spaces are important tasks that promote energy efficiency and user comfort in buildings. For them, as for many other smart building applications, machine learning modelling utilizing sensor data is commonly applied. This article builds understanding of the environment where this kind of machine learning models have to operate by bringing up properties and quality aspects of the public building data provided by indoor sensor devices. This is done by performing a thorough case study on two real life data sets from university campus buildings located in different climates and applying very different sensor network settings. Outcomes include information about heterogeneity, correlations and temporal patterns present in sensor data, and show the need of the building field for better acknowledging the quality deficiencies that sensor data have. Our results aid in assessing and improving the quality of sensor-based indoor data utilized in machine learning modelling, in evaluating whether a data set is representative enough to build a model that is robust under changing conditions in the building, and in choosing an appropriate number of sensors per space when building an indoor wireless sensor network.
Read full abstract