Forest fires have emerged as a considerable challenge in the United States, posing substantial societal, economic, and environmental risks. As a result, the early and accurate prediction of these fires is imperative for management efforts. In this study, we used two Kaggle datasets: the “Algerian Forest Fires Dataset” with fire readings from 2012 and the “Forest Fires Data Set” with readings from 2007. However, because the second data set was originally intended for a regression task, providing approximate area values representing the predicted burned area of the forest fire, we phased the data set out while developing our final model. Ultimately, we used the Algerian Forest Fires Dataset, containing 13 attributes and 244 instances of forest fires in two regions of Algeria. To streamline the analysis, we reduced the number of features to 5, namely, month, temperature, humidity, wind, and rain. Moreover, we developed a Random Forest Classifier model to predict the occurrence of a forest fire, using the data set for training and testing. Performance was compared against Decision Tree, Logistic Regression, and Artificial Neural Network models, using cross-validation. The experiment showed a slight superiority to the Random Forest Classifier approach, achieving an accuracy score of 86.486% and an F1 score of 88.889%. Our approach provides a decimal value representing the probability for fire likelihood. Overarchingly, this research contributes to the advancement of forest fire prediction technologies by leveraging meteorological data.
Read full abstract