Enhanced semantic representation model for multisource point of interest attribute alignment

Pengpeng Li,Yong Wang,Jiping Liu,An Luo,Shenghua Xu,Zhiran Zhang

doi:10.1016/j.inffus.2023.101852

Abstract

Multisource point of interest (POI) attribute alignment is the consistent processing of heterogeneous attribute values from different data sources pointing to the same POI data, which is one of the key technologies to achieve geospatial data fusion. However, semantic heterogeneity problems of synonyms and homographs among different POI data sources are encountered, which makes multisource POI data fusion challenging. This paper proposes a multisource POI attribute alignment method based on the Enhanced Semantic Representation Model (ESRM). First, the unlabeled corpus is preprocessed by Chinese word segmentation and attribute expression sequence construction. Then, the ESRM is pre-trained using the relational consistency prediction and replacement language model tasks. Finally, the model is fine-tuned through supervised learning to perform the attribute alignment task for multisource POI data, as per the specific downstream tasks. We used the POI attributes of Baidu Map, Tencent Map, and Gaode Map in Chengdu, China as the experimental data. The findings demonstrate that the proposed model outperforms existing methods for attribute alignment. Specifically, the category attribute consistency achieves a Macro-F1 value of over 90%, and the address attribute standardization achieves a BLEU-4 score of over 95%.

Full Text