Abstract

Nowadays, several systems to set up landslide inventories exist although they rarely rely on automated or real-time updates. Mass media can provide reliable info about natural hazard events with a relatively high temporal and spatial resolution. The news publication about a natural disaster inside newspaper or crowdsourcing platforms allows a faster observation, survey, and classification of these phenomena. Several techniques have been developed for data mining inside social media for many natural events, but they have been rarely applied to the automatic extraction of “landslide events”. This source of information allows continuous feedback from real world, and news concerning landslide events can be rapidly collected. In this work, the newspaper articles about landslides in Italy are automatically collected by an existing data mining algorithm, based on a semantic engine. The news has been analysed to assess their distribution over the territory and to verify the possibility of using them for hazard mapping purpose. In 10 years, from 2010 to 2019, the algorithm identified and geolocated 184322 articles referring to 32525 generical events (“news”). At first, the collected data underwent to a manual verification, followed by a classification based on news relevance, localization accuracy and time of publication. Then, these data have been used to identify the areas and the periods most affected by landslide phenomena. The analyses show that almost 42% of Italian municipalities have been affected by landslide. According to the results, the use of data mining is helpful for the creation of landslide databases where the day and the approximative location (municipality) of the possible landslide triggers are known. This database, in turn, can be used for scientific purposes, as the definition of the meteorological condition associated with landslide initiation, the validation of risk maps. It can also be used for a proper land use or risk mitigation planning, since the most landslide-prone municipalities can be defined.

Highlights

  • Landslides are extremely widespread in the Italian territory, and they are, along with floods, the most frequent natural hazard, causing the greatest number of losses of human lives and damages to properties and infrastructures (Guzzetti 2000)

  • Semantic Engine to Classify and Geotagging News (SECaGN) is an algorithm based on a mechanism of acquisition, management and publishing of online articles related to natural hazard

  • From 2010 to 2019, 32525 news have been gathered by the used data mining algorithm

Read more

Summary

Introduction

Landslides are extremely widespread in the Italian territory, and they are, along with floods, the most frequent natural hazard, causing the greatest number of losses of human lives and damages to properties and infrastructures (Guzzetti 2000). Landslide research relies on landslide inventories for a multitude of spatial, temporal or process analysis (Van Den Eeckhaut and Hervás 2012; Kirschbaum et al 2015; Klose et al 2015) These inventories can be created with several methods as photo-interpretation, field surveys (Brunsden 1985) and remote sensing (Soeters and Van Westen 1996; McKean and Roering 2003; Lu et al 2012; Bianchini et al 2018; Solari et al 2020) or retrieval of data from technical reports and/or newspapers (Kirschbaum et al 2010; Görüm and Fidan 2021; Guzzetti et al 2008; Klimeš et al 2017; Vennari et al 2014; Rosi et al 2019) or a combination of them (Dikau et al 1996; Rosi et al 2012; Rosser et al 2017).

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.