Understanding the mechanisms and risks of forest fires by building a spatial prediction model is an important means of controlling forest fires. Non-fire point data are important training data for constructing a model, and their quality significantly impacts the prediction performance of the model. However, non-fire point data obtained using existing sampling methods generally suffer from low representativeness. Therefore, this study proposes a non-fire point data sampling method based on geographical similarity to improve the quality of non-fire point samples. The method is based on the idea that the less similar the geographical environment between a sample point and an already occurred fire point, the greater the confidence in being a non-fire point sample. Yunnan Province, China, with a high frequency of forest fires, was used as the study area. We compared the prediction performance of traditional sampling methods and the proposed method using three commonly used forest fire risk prediction models: logistic regression (LR), support vector machine (SVM), and random forest (RF). The results show that the modeling and prediction accuracies of the forest fire prediction models established based on the proposed sampling method are significantly improved compared with those of the traditional sampling method. Specifically, in 2010, the modeling and prediction accuracies improved by 19.1% and 32.8%, respectively, and in 2020, they improved by 13.1% and 24.3%, respectively. Therefore, we believe that collecting non-fire point samples based on the principle of geographical similarity is an effective way to improve the quality of forest fire samples, and thus enhance the prediction of forest fire risk.
Read full abstract