TopoBERT: a plug and play toponym recognition module harnessing fine-tuned BERT

Bing Zhou,Lei Zou,Yingjie Hu,Yi Qiang,Daniel Goldberg

doi:10.1080/17538947.2023.2239794

Abstract

ABSTRACT Extracting precise geographical information from the textual content, referred to as toponym recognition, is fundamental in geographical information retrieval and crucial in a plethora of spatial analyses, e.g. mining location-based information from social media, news reports, and surveys for various applications. However, the performance of existing toponym recognition methods and tools is deficient in supporting tasks that rely on extracting fine-grained geographic information from texts, e.g. locating people sending help requests with addresses through social media during disasters. The emerging pretrained language models have revolutionized natural language processing and understanding by machines, offering a promising pathway to optimize toponym recognition to underpin practical applications. In this paper, TopoBERT, a uniquely designed toponym recognition module based on a one-dimensional Convolutional Neural Network (CNN1D) and Bidirectional Encoder Representation from Transformers (BERT), is proposed and fine-tuned. Three datasets are leveraged to tune the hyperparameters and discover the best strategy to train the model. Another seven datasets are used to evaluate the performance. TopoBERT achieves state-of-the-art performance (average f1-score = 0.854) compared to the seven baseline models. It is encapsulated into easy-to-use python scripts and can be seamlessly applied to diverse toponym recognition tasks without additional training.

Full Text