Data driven methods are the most studied fault detection and diagnostics (FDD) type in buildings HVAC systems. However, most studies rely on labeled data for specific faults which are hard to find and collect for real systems. While the fault-free data is easier to collect, it is still time consuming to label for large systems operation. Moreover, most of the studies rely on the usage of supervised learning algorithms which do not generalize well beyond the training data making unseen faults hard to detect. In this paper, we define a methodology to use a self-supervised learning method for HVAC systems' FDD using a Transformer encoder, moreover, we tested it on a real case study. By strategically masking portions of the multivariate time-series data using Markov chain approach with two states. The model is trained by predicting these concealed segments. This approach, independent of labeled data, offers a scalable solution for practical HVAC applications. Anomalies are labeled using the Peak Over Threshold (POT) method, which dynamically determines thresholds by fitting reconstruction errors to a generalized Pareto distribution. Subsequent fault diagnostics emphasize features with pronounced reconstruction errors, pinpointing potential HVAC malfunctions. This methodology reduces dependence on labeled datasets and augments the model's generalization, facilitating detection of unobserved faults. This approach was applied to data from a real building. As a results multiple faults were detected mainly due to the malfunctioning of the monitoring system. The model demonstrates the ability to detect both sequential and individual faults. The period from October 19th to December 23rd was detected as a fault period due to the change in the trend of the data because of the monitoring system.
Read full abstract