Abstract

Predicting post-fire tree mortality is a major area of research in fire-prone forests, woodlands, and savannas worldwide. Past research has relied overwhelmingly on logistic regression analysis (LR) that predicts post-fire tree status as a binary outcome (i.e. living or dead). One of the most problematic issues for LR (or any classification problem) occurs when there is a class imbalance in the training data. In these instances, predictions will be biased toward the majority class. Using a historical prescribed fire data set of longleaf pines (Pinus palustris) from northern Florida, USA, we compare results from standard LR and the machine-learning algorithm, random forest (RF). First, we demonstrate the class imbalance problem using simulated data. We then show how a balanced RF model can be used to alleviate the bias in the model and improve mortality prediction results. In the simulated example, LR model sensitivity and specificity was clearly biased based on the degree of imbalance between the classes. The balanced RF models had consistent sensitivity and specificity throughout the simulated data sets. Re-analyzing the original longleaf pine data set with a balanced RF model showed that although both LR and RF models had similar areas under the receiver operating curve (AUC), the RF model had better discrimination for predicting new observations of dead trees. Both LR and RF models identified duff consumption and percent crown scorch as important predictors of tree mortality, however the RF model also suggested pre-fire duff depth as an important predictor. Our analysis highlights LR limitations when data are imbalanced and supports using RF to develop post-fire tree mortality models. We suggest how RF can be incorporated into future tree mortality studies, as well as possible implementation in future decision-support tools.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call