Abstract

Despite the known financial, economical, and humanitarian impacts of hurricanes and the floods that follow, datasets consisting of flood and flood risk reduction projects are either small in scope, lack in details, or held privately by commercial holders. However, with the amount of online data growing exponentially, we have seen a rise of information extraction techniques on unstructured text to drive insights. On one hand, social media in particular has seen a tremendous increase in popularity. On the other hand, despite this popularity, social media has proven to be unreliable and difficult to extract full information from. In contrast, online newspapers are often vetted by a journalist, and consist of more fine details. As a result, in this paper we leverage Natural Language Processing (NLP) to create a hybrid Named-Entity Recognition (NER) model that employs a domain-specific machine learning model, linguistic features, and rule-based matching to extract information from newspapers. To the knowledge of the authors, this model is the first of its kind to extract detailed flooding information and risk reduction projects over the entire contiguous United States. The approach used in this paper expands upon previous similar works by widening the geographical location and applying techniques to extract information over large documents, with minimal accuracy loss from the previous methods. Specifically, our model is able to extract information such as street closures, project costs, and metrics. Our validation indicates an F1 score of 72.13% for the NER model entity extraction, a binary classification location filter with a score of 73%, and an overall performance only 8.4% lower than a human validator against a gold-standard. Through this process, we find the location of 27,444 streets, 181,076 flood risk reduction projects, and 435,353 storm locations throughout the United States in the past two decades.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call