Abstract

Natural hazard exposure modelling involves constructing databases that describe the elements (people and built environment) exposed to some hazard in a selected location. These databases are often constructed using information from censuses, cadastral data, or satellite imagery. In this work, we suggest complementing hazard exposure modelling using an alternative and unconventional data source: the text components of building permits. The proposed methodology, Natural Language Processing for the Global Exposure Database (NLP4GED), adopts natural language processing techniques to extract building-by-building exposure attributes in line with the GED4ALL taxonomy (Global Exposure Database for ALL). This three-step methodology involves using: a classifier to filter permits potentially containing exposure information; a clustering algorithm to identify semantically similar permits; and regular expressions (or regex) to extract exposure-attributes. As an illustrative application, we apply NLP4GED to wrangle an unstructured real-world dataset of 100,989 building permits in Malta. We effectively provide relevant exposure attributes (i.e., year of construction, building height, and occupancy) for 23,076 buildings presented in a geographic information system (GIS) environment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call