Predicting relative species composition within mixed conifer forest pixels using zero-inflated models and Landsat imagery

Shannon L Savage,Rick L Lawrence,John R Squires

doi:10.1016/j.rse.2015.10.013

Abstract

Abstract Ecological and land management applications would often benefit from maps of relative canopy cover of each species present within a pixel, instead of traditional remote-sensing based maps of either dominant species or percent canopy cover without regard to species composition. Widely used statistical models for remote sensing, such as randomForest (RF), support vector machines (SVM), and generalized linear regression (GLM), are problematic for this purpose as they often fail to properly predict the absence of a target species, especially in areas of high vegetation diversity, due to the relative abundance of absence observations (or zero values) in the reference data used to train predictive models. Experience has shown that RF, SVM, and GLM models trained on such reference data produce biased values of PCC, for example, in forested areas absent the target species, PCC is overestimated, while in forested areas where a target species PCC is abundant, PCC tends to be underestimated. We used zero-inflated regression modeling to reduce such bias and better predict PCC-by-species within each pixel in mixed conifer forests. Zero-inflated regression models use a two-step process to first predict the presence or absence of the target species, and then to predict continuous levels of PCC only where the target species is present. We compared the results of three widely used methods (RF, SVM, and GLM) to nine zero-inflated models for their ability to predict continuous PCC for each of five different conifer species in heterogeneous forests of northwestern Montana using Landsat TM and OLI imagery. Our best zero-inflated models resulted in a mean difference of − 3.84% to 2.26%, 95% confidence interval of 6.22% to 13.09%, and RMSE of 11.26% to 22.98%, depending on the species. The success of the zero-inflated model was robust across methods tested. Both the zero-inflated and traditional methods were successful in estimating continuous canopy cover, however, the traditional models showed a substantial bias by never correctly predicting the absence of the target species, while the zero-inflated models correctly predicted species absence 57% to 84% of the time, depending on the species. Visual inspection of the predicted maps compared to high-resolution imagery demonstrated that the zero-inflated models also more closely matched the landscape, as traditional models more often incorrectly predicted canopy cover in non-forested areas. Using the zero-inflated process dramatically reduced the bias of the results, allowing end users to make management decisions with increased confidence about where a target species is absent, something not possible with the traditional methods tested.

Full Text