Abstract

Mapping species through classification of imaging spectroscopy data is facilitating research to understand tree species distributions at increasingly greater spatial scales. Classification requires a dataset of field observations matched to the image, which will often reflect natural species distributions, resulting in an imbalanced dataset with many samples for common species and few samples for less common species. Despite the high prevalence of imbalanced datasets in multiclass species predictions, the effect on species prediction accuracy and landscape species abundance has not yet been quantified. First, we trained and assessed the accuracy of a support vector machine (SVM) model with a highly imbalanced dataset of 20 tropical species and one mixed-species class of 24 species identified in a hyperspectral image mosaic (350–2500 nm) of Panamanian farmland and secondary forest fragments. The model, with an overall accuracy of 62% ± 2.3% and F-score of 59% ± 2.7%, was applied to the full image mosaic (23,000 ha at a 2-m resolution) to produce a species prediction map, which suggested that this tropical agricultural landscape is more diverse than what has been presented in field-based studies. Second, we quantified the effect of class imbalance on model accuracy. Model assessment showed a trend where species with more samples were consistently over predicted while species with fewer samples were under predicted. Standardizing sample size reduced model accuracy, but also reduced the level of species over- and under-prediction. This study advances operational species mapping of diverse tropical landscapes by detailing the effect of imbalanced data on classification accuracy and providing estimates of tree species abundance in an agricultural landscape. Species maps using data and methods presented here can be used in landscape analyses of species distributions to understand human or environmental effects, in addition to focusing conservation efforts in areas with high tree cover and diversity.

Highlights

  • Mapping tree species distributions in tropical landscapes has been a clear goal of the remote sensing community [1,2] because of its ecological applications for understanding spatial patterns of tree populations and species co-occurrence [3,4], and conservation applications to identify regions of high diversity [5], invasive species [6,7,8], or rare and ecologically important species [9]

  • We have applied evaluation tools to quantify the effect of widely differing training class sizes for species classification from imaging spectroscopy data and generated landscape species abundance distributions, which have been adjusted to account for model error

  • Our classification model with imbalanced data suggests that while more common species were overrepresented in the model predictions for the test dataset, this had a small effect on the landscape species distributions

Read more

Summary

Introduction

Mapping tree species distributions in tropical landscapes has been a clear goal of the remote sensing community [1,2] because of its ecological applications for understanding spatial patterns of tree populations and species co-occurrence [3,4], and conservation applications to identify regions of high diversity [5], invasive species [6,7,8], or rare and ecologically important species [9]. High spatial resolution imaging spectroscopy that can resolve individual tree crowns and capture small differences in reflectance patterns among species can help achieve these goals [10]. To achieve these application goals, species classifications are moving beyond understanding the spectral separability of species and towards operational species mapping, where classification models are applied to an entire remotely sensed image to predict species identity and locations across a landscape [3,11,12,13]. Many studies have explored the effect of the spectral uniqueness of species at multiple scales on the success of classification models [9,14,15]. Other studies have explored the scope of the model, quantifying the decline in prediction accuracy with an increase in the number of classes [16] and smaller class sample sizes [17], in addition to providing guidelines for the optimal sample size needed to achieve maximum accuracy given the number of classes in the model [15]

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call