This study aims to explore an efficient technique for matching multisource homonymous geographical entities in railways to address the identification issues of homonymous geographical entities. Focusing on railway line vector spatial data, this research investigates the matching problem of multisource homonymous geographical entities. Building on statistical feature matching of attribute data, a curve similarity calculation method based on the DTW algorithm is designed to achieve better local elastic matching, overcoming the limitations of the Fréchet algorithm. The empirical study utilizes railway line layer data from two data sources within Beijing’s jurisdiction, fusing 6237 segment lines from source 2 with 105 long lines from source 1. The structural comparison between the two data sources is conducted through statistical methods, applying cosine similarity and the maximum similarity value of TF-IDF for text similarity calculation. Finally, Python is used to implement the DTW algorithm for curve similarity. The experimental results show an average DTW distance of 3.92, a standard deviation of 4.63, and a mode of 0.005. Similarity measurement results indicate that 95.53% of records are within the predetermined threshold, demonstrating the effectiveness and applicability of the method. The findings significantly enhance the accuracy of railway data matching, promoting the informatization of the railway industry, and hold substantial significance for improving railway operational efficiency and system performance.
Read full abstract