A semi-automatic natural language tool to minimize systematic biases in geo-hydrological disaster datasets in tropical Africa

Bram Valkenborg,Olivier Dewitte,Benoît Smets

doi:10.5194/egusphere-egu24-7652

Abstract

The high susceptibility to geo-hydrological hazards in tropical Africa and their impacts remain poorly documented in existing disaster databases. Only impactful events with high attention are manually reported, creating systematic biases. Natural Language Processing has the potential to automate the documentation of geo-hydrological disasters. This research focuses on developing a semi-automated tool to extract information from online press and social media posts. Fine-tuned Large Language Models perform a series of tasks, such as question-answering, zero-shot classification, and near-entity recognition, to extract information from these online sources. A three-step approach is proposed for the detection of events: (1) filtering posts or articles on their relevancy, (2) extracting information on the location, timing, and impact and (3) merging and sorting information to document identified events into a structured disaster database. Shortcomings compared to a manual approach remain. These mainly relate to the complexity of the text or toponymic ambiguity when geocoding events. The tool is therefore complementary to other information-gathering approaches. These new sources of information will improve our understanding of the distribution of disasters related to geo-hydrological hazards, especially in data scarce context. Future work will combine this semi-automated tool with remote sensing and citizen science data, to further reduce systematic biases in disaster datasets.

Full Text