Complex nested and discontinuous location references are common in unstructured text. Extracting them is essential for accurate location information retrieval and spatial inference. However, traditional methods struggle with these references due to annotation system and model architecture limitations. In this study, we introduce a deep learning approach to uniformly recognize flat, nested, and discontinuous location references, motivated by recognizing fine-grained expressway location references. The approach uses a pre-trained language model to generate semantic sentence representations and a distance and direction-aware Transformer for contextual encoding. Then, it recognizes location references by modeling the adjacency and boundary relations between word pairs. We evaluated the approach on seven benchmark datasets and compared it with state-of-the-art methods. The results show that the approach achieves higher accuracy with faster inference, validating our modeling paradigm and architecture. The ablation study further confirms the effectiveness of submodules in architecture. These findings can provide valuable insights for developing advanced unified location reference recognition methods. Moreover, the detailed labeled dataset for location references can facilitate the evaluation and comparison of unified recognition methods and systems.
Read full abstract