Abstract

The rapid development in data science and the increasing availability of building operational data have provided great opportunities for developing data-driven solutions for intelligent building energy management. Data preprocessing serves as the foundation for valid data analyses. It is an indispensable step in building operational data analysis considering the intrinsic complexity of building operations and deficiencies in data quality. Data preprocessing refers to a set of techniques for enhancing the quality of the raw data, such as outlier removal and missing value imputation. This article serves as a comprehensive review of data preprocessing techniques for analysing massive building operational data. A wide variety of data preprocessing techniques are summarised in terms of their applications in missing value imputation, outlier detection, data reduction, data scaling, data transformation, and data partitioning. In addition, three state-of-the-art data science techniques are proposed to tackle practical data challenges in the building field, i.e., data augmentation, transfer learning, and semi-supervised learning. In-depth discussions have been presented to describe the pros and cons of existing preprocessing methods, possible directions for future research and potential applications in smart building energy management. The research outcomes are helpful for the development of data-driven research in the building field.

Highlights

  • As highlighted by the International Energy Agency (IEA), the building sector has become the largest energy consumer in the world, and accounts for more than a third of global energy consumption (IEA, 2019)

  • The results showed that the recursive feature elimination (RFE) method was able to automatically and objectively choose the optimal input combination for different predictive algorithms from different datasets, resulting in more flexibility in real applications (Fan et al, 2014)

  • The results showed that the nonlinear features extracted could improve the accuracy of building energy prediction models, while the other conventional feature extraction methods may not be able to enhance the prediction performance given different supervised learning algorithms

Read more

Summary

INTRODUCTION

As highlighted by the International Energy Agency (IEA), the building sector has become the largest energy consumer in the world, and accounts for more than a third of global energy consumption (IEA, 2019). The first includes mean imputation, forward or backward imputation, and moving average methods In such a case, missing values are inferenced based on data characteristics of that variable alone and are called univariate methods. The forward or backward method replaces the missing value with the previous or data measurement Regression-based imputation methods typically adopt machine learning algorithms to capture cross-sectional or temporal data dependencies for missing value imputation (Jenghara et al, 2018). The GESD method has proved to be computationally efficient in detecting outliers in building energy data (Fan et al, 2014) It assumes that the data follow a normal distribution, which may not be the case for actual building variables. Ashouri et al used such method to remove outliers in building energy data, based on which regression models were developed for data replacements (Ashouri et al, 2018; Ashouri et al, 2020)

Method
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.