Abstract

Using machine learning and earth observation data to capture real-world variability in spatial predictive mapping depends on sample size, design, and spatial extent. Nonetheless, there is still ambiguity in answering some basic questions: a) How many samples are necessary for fitting the model? b) Which sampling techniques are suitable for modeling? c) Do results vary with changes in spatial extents? These questions are crucial for spatial modeling projects and require proper investigation. In the present study, we evaluated two sampling designs with different sample sizes, considering three nested spatial extents. Specifically, we adopted the conditioned Latin Hypercube Sampling and Simple Random Sampling designs. Based on this, a Random Forest model was used to predict Above-Ground forest Biomass at local, regional, and national spatial extents, comparing different sample sizes (n = 25, 50, 100, 200, 300, and 500). We defined one national extent, five regional extents within the national extent, and a local extent inside each regional extent. Each sampling design and size combination was tested 100 iterations. The results showed that there was no significant difference between the different sampling designs. The accuracy metrics showed marginal differences for 25 and 50 sample sizes, which were then reduced to minimal and provided similar results. However, a deeper analysis of all 100 repetitions exposed a noteworthy pattern: cLHS outperformed the SRS in terms of RMSE and variability. Regarding the sampling size, the R2 values increased with increasing sample size. Nevertheless, beyond a minimum of 300 to 500 samples, the improvement in accuracy became insignificant, emphasizing the diminishing returns with excessively large sample sizes. Moreover, increasing the size of the spatial extent reduced the accuracy of the model, possibly due to the effect of environmental factors or landscape nature. Therefore, this study demonstrates the potential impact of sample size, sampling design, and spatial extents on model accuracy and emphasizes the importance of reducing the sample size to reduce the model's complexity.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call