West Nile virus (WNV) is the leading cause of mosquito-borne illness in the continental United States (CONUS). Spatial heterogeneity in historical incidence, environmental factors, and complex ecology make prediction of spatiotemporal variation in WNV transmission challenging. Machine learning provides promising tools for identification of important variables in such situations. To predict annual WNV neuroinvasive disease (WNND) cases in CONUS (2015-2021), we fitted 10 probabilistic models with variation in complexity from naïve to machine learning algorithm and an ensemble. We made predictions in each of nine climate regions on a hexagonal grid and evaluated each model's predictive accuracy. Using the machine learning models (random forest and neural network), we identified the relative importance and variation in ranking of predictors (historical WNND cases, climate anomalies, human demographics, and land use) across regions. We found that historical WNND cases and population density were among the most important factors while anomalies in temperature and precipitation often had relatively low importance. While the relative performance of each model varied across climatic regions, the magnitude of difference between models was small. All models except the naïve model had non-significant differences in performance relative to the baseline model (negative binomial model fit per hexagon). No model, including the ensemble or more complex machine learning models, outperformed models based on historical case counts on the hexagon or region level; these models are good forecasting benchmarks. Further work is needed to assess if predictive capacity can be improved beyond that of these historical baselines.
Read full abstract