For an efficient operation of district heating systems, being able to detect anomalies and faults at an early stage is highly desirable. Here, data-driven machine learning methods can be a cornerstone, particularly for fault detection in district heating substations, where the availability of heat meter data keeps increasing. However, the creation of data sets suitable for training such machine learning models poses challenges to researchers and practitioners alike. To address this problem, we propose a systematic and domain-specific process for data set creation for fault detection in the form of practical guidelines. This process concretizes the data science and data mining cross-industry standard CRISP-DM for the district heating domain and focuses on the process steps of goal definition, data acquisition and understanding, and data curation. We aim to enable researchers and practitioners to create data sets for fault detection in the district heating domain and therefore also enable the creation or improvement of machine learning models in this domain. In addition, we propose a minimum viable feature set for fault detection in district heating networks with the goal of enabling better cooperation between researchers and easier transfer of the resulting machine learning models, to better proliferate new progress in the field.
Read full abstract