Semantic-Similarity-Based Schema Matching for Management of Building Energy Data

Zhiyu Pan,Guanchen Pan,Antonello Monti

doi:10.3390/en15238894

Zhiyu Pan, Guanchen Pan + Show 1 more

Open Access

https://doi.org/10.3390/en15238894

Copy DOI

Abstract

The increase in heterogeneous data in the building energy domain creates a difficult challenge for data integration. Schema matching, which maps the raw data from the building energy domain to a generic data model, is the necessary step in data integration and provides a unique representation. Only a small amount of labeled data for schema matching exists and it is time-consuming and labor-intensive to manually label data. This paper applies semantic-similarity methods to the automatic schema-mapping process by combining knowledge from natural language processing, which reduces the manual effort in heterogeneous data integration. The active-learning method is applied to solve the lack-of-labeled-data problem in schema matching. The results of the schema matching with building-energy-domain data show the pre-trained language model provides a massive improvement in the accuracy of schema matching and the active-learning method greatly reduces the amount of labeled data required.

Full Text