Abstract In the process of real-view 3D construction, the description of the same geospatial object by multiple departments is not the same, and how to link the city’s multi-source datasets to the same entity and construct the mapping relationship between multi-source data has become the current difficulty of multi-source data convergence. A double-matching degree entity recognition method was proposed in this paper for the problem of unified geographic entities in data convergence. Firstly, the address information is extracted from the original address data according to the address elements, and the hierarchical address tree is established by using the organization of a dictionary tree for hierarchical matching. The BERT model and twin neural network are introduced to divide the address information on the remaining paths at the byte and word level to calculate the similarity of address information. Then, the similarity of entity names is calculated by the method based on editing distance, and by combining the string-based method and the neural network-based method with each other, the hierarchical step-by-step matching is performed to narrow the matching space, determine the matching entities, and establish the entity links. Experiments on the self-built dataset show that this method has performed better in entity recognition compared to other methods.
Read full abstract