Abstract

Human and natural processes such as navigation and natural calamities are intrinsically linked to the geographic space and described using place names. Extraction and subsequent geocoding of place names from text are critical for understanding the onset, progression, and end of these processes. Geocoding place names extracted from text requires using an external knowledge base such as a gazetteer. However, a standard gazetteer is typically incomplete. Additionally, widely used place name geocoding—also known as toponym resolution—approaches generally focus on geocoding ambiguous but known gazetteer place names. Hence, there is a need for an approach to automatically geocode non -gazetteer place names. In this research, we demonstrate that patterns in place names are not spatially random. Places are often named based on people, geography, and history of the area and thus exhibit a degree of similarity. Similarly, places that co-occur in text are likely to be spatially proximate as they provide geographic reference to common events. We propose a novel data-driven spatially-aware algorithm, Bhugol , that leverages the spatial patterns and the spatial context of place names to automatically geocode the non-gazetteer place names. The efficacy of Bhugol is demonstrated using two diverse geographic areas – USA and India. The results show that Bhugol outperforms well-known state-of-the-art geocoders.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call