Abstract
The selection of unburned labels is a crucial step in machine learning modelling of wildfire occurrence probability. However, the effect of different sampling strategies on the performance of machine learning methods has not yet been thoroughly investigated. Additionally, whether the ratio of burned labels to unburned labels should be balanced or imbalanced remains a controversial issue. To address these gaps in the literature, we examined the effects of four broadly used sampling strategies for unburned label selection: (1) random selection in the unburned areas, (2) selection of areas with only one fire event, (3) selection of barren areas, and (4) selection of areas determined by the semi-variogram geostatistical technique. The effect of the balanced and imbalanced ratio between burned and unburned labels was also investigated. The random forest (RF) method explored the relationships between historical wildfires that occurred over the period between 2001 and 2020 in Yunnan Province, China, and climate, topography, fuel and anthropogenic variables. Multiple metrics demonstrated that the random selection of the unburned labels from the unburned areas with an imbalanced dataset outperformed the other three sampling strategies. Thus, we recommend this strategy to produce the required datasets for machine learning modelling of wildfire occurrence probability.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.