BackgroundForest fire is a very common natural or human-made disaster, which not only burns precious forest resources, but also further alters the forest ecosystem structure and functions as well as global climate. Accordingly, effective information extraction, data management, and modeling of historical fire events are imperative for forecasting and mitigating the onset and progression of forest fires. Traditional remote sensing-based forest fire detection and risk prediction operate at the pixel scale, overlooking the integrity of fire behaviour. Moreover, fire cause records typically stem from ground reports, not fully leveraging the advantages of high temporal resolution thermal infrared remote sensing products. MethodsWe proposed a framework to identify the causes of forest fire events, experimenting with fire incidents from 2001 to 2020 in the Daxing'anling region of Northeast China. Initially, the Jenk-DBSCAN model was employed to acquire fire footprints representative of forest fire events from MODIS MCD64A1 product, and footprint-based fire event attribute information such as ignition locations and timings. Fire footprints matching ground surveys were subsequently filtered, and corresponding environmental variable datasets (including meteorological, topographical, fuel, and human activity factors) were derived in consideration of the spatio-temporal differences of footprint origin and center points. Finally, the optimal variable combinations of the two datasets were fed to three binary classification models including Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM) respectively to identify natural fires or human-induced fires, followed by assessing the accuracy of the fire cause identification models. ResultsResults revealed that there was a high spatio-temporal consistency of those matched fire footprints between the model-extracted and the ground-recorded fire events, evidenced by an overall R2 of 0.87 derived from their area validation. The ignition dates extracted from fire footprints often preceded those reported by local agencies. Among all the selected variable combinations for fire cause identification, the meteorological variables dominated, with daily mean dew point (DEWP) and daily mean visibility (VISIB) being preferentially selected by most models. Across the six models, the model training AUC values were almost greater than 0.8, with the LR model demonstrating superior validation accuracy over RF and SVM. Notably, the accuracy of the LR model using the origin point based dataset surpassed that of the center point-based dataset. The best model achieved an overall accuracy of 90.48%, with the user’s accuracies for human-made fire and natural fire being at 92.86% and 85.71%, respectively. ConclusionsThe proposed framework offers a reliable method for enriching attribute information in forest fire management databases. The resulting fire footprints and origin points can assist in better characterizing the progression of forest fire events. Such endeavours help devise targeted forest fire prevention regulations, fostering sustainable forest management.