Combining spatial and sociodemographic regression techniques to predict residential fire counts at the census tract level

Tyler Buffington,James G Scott,Ofodike A Ezekoye

doi:10.1016/j.compenvurbsys.2021.101633

Abstract

This work examines different spatial and sociodemographic models for predicting residential fire counts at the census tract level for 118 U.S. fire departments across 25 states. The models give five-year forecasts of residential fire counts for 3392 census tracts which contain over 13 million residents in total. All models described in this paper train on fire incident data from the National Fire Incident Reporting System (NFIRS) over the interval 2006–2011 (inclusive) and are evaluated based on their ability to predict the fire counts that occurred over the interval 2012–2016. Two strictly spatial models are considered- a simple “count” model that serves as a baseline for all other models described in the paper and a model that utilizes kernel density estimation (KDE) with statistically optimized bandwidths. Using data from the American Community Survey (ACS), an examination of the effects of demographic and housing factors on the fire risk is presented. The data suggest that the fire risk per person is generally higher in census tracts with attributes corresponding to socioeconomic disadvantage such as low median incomes and small fractions of residents with college degrees. These trends inform the design of a Bayesian hierarchical Poisson regression model, which is shown to make predictions with a 9% lower root mean squared error (RMSE) relative to the base model. A spatial kernel regression is then conducted on the residuals of this regression, which results in a 15% RMSE improvement relative to the base model. These results are compared to a conditional autoregressive (CAR) model, which incorporates spatial information directly into the hierarchical Poisson regression. Although the RMSE is higher for the CAR model's point estimate forecasts (7% lower than the base model), it allows for the generation of probabilistic forecasts and gives spatially-informed statistical estimates of the effects of the sociodemographic variables. This work highlights the utility of geocoded fire incident and demographic data as well as machine learning techniques that can utilize these datasets to make improved predictions.

Full Text