Abstract

There are already numerous examples of ML models predicting wildfire occurrence or susceptibility (e.g., Forkel et al., 2019, Cilli et al., 2022). The majority only makes a prediction without post-hoc interpretation of the model and/or quantification of the reliability of individual predictions. We want to take the ML model beyond its predicted values and learn about wildfire drivers from the model. Our main goal is to discover meaningful patterns in wildfire data that can be interpreted and understood to extract knowledge from the data. Our approach combines state-of-the-art methods for feature attribution, dimensionality reduction and clustering to identify the most representative decisions of the ML model leading to its predictions. We introduce a novel, multi-stage clustering methodology for subgroup discovery based on SHAP (SHapley Additive exPlanation) values, UMAP (Uniform Manifold Approximation and Projection) dimensionality reduction and hierarchical density-based clustering (HDBSCAN). With this approach, it is possible to identify a group of parameters that can be used to predict whether a wildfire is expected to occur or not. For this we build upon existing datasets of fire occurrences in both the Netherlands and Italy. Central to our methodology is the use of SHAP values to define subgroups (i.e., combinations of parameter values that can describe whether a fire occurs or not). As such, it eliminates noisy information from the dataset, however, preserving the aspects crucial to clustering and mitigating the effect of fluctuations in feature values that only make a small contribution to the model outcome. We enhance the clustering performance and interpretability of results by reducing multidimensional SHAP values to two dimensions before clustering through UMAP. We constructed decision-rules for each cluster that identify and differentiate the clusters, which results in highly discriminative and easily interpretable subgroup descriptions. This approach prevents large and overlapping rule sets, which often occurs when clustering is based on the raw feature space and requires manual filtering by experts. We validated our results and approach with a more conventional procedure that directly clusters in the feature space, skipping the ML model and SHAP values calculation. As a supplement to the decision rules, the model's effectiveness is assessed for each prediction in all subgroups. This differs from the conventional approach, which relies on performance metrics for an entire test set. Based on the two case studies, we conclude that supervised clustering effectively characterizes wildfire occurrence, attributing it to a set of influencing factors, both in the feature space and the spatial domain. Our approach also provides valuable insights into the performance of the ML model under diverse conditions, highlighting situations where predictions demand careful consideration.   Forkel, M., Andela, N., Harrison, S. P., et al. (2019). Emergent relationships with respect to burned area in global satellite observations and fire-enabled vegetation models, Biogeosciences, 16, 57–76, https://doi.org/10.5194/bg-16-57-2019. Cilli, R., Elia, M., D’Este, M., et al. (2022). Explainable artificial intelligence (XAI) detects wildfire occurrence in the Mediterranean countries of Southern Europe. Sci Rep 12, 16349. https://doi.org/10.1038/s41598-022-20347-9.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.