Choosing a sample size allocation to strata based on trade-offs in precision when estimating accuracy and area of a rare class from a stratified sample

Stephen V Stehman,John E Wagner

doi:10.1016/j.rse.2023.113881

Abstract

Stratified random sampling is often used to obtain reference data for assessing the accuracy of land cover maps created from remotely sensed data and for estimating area of land cover and land cover change. The sample size allocation to strata determines the precision of estimates of user's accuracy, producer's accuracy, and proportion of area. Different choices of nh (the sample size in stratum h) may favor precision of one estimate at the expense of a larger standard error for another estimate. Here we address the question of optimally allocating a sample of size n when multiple estimates are of interest, focusing on applications in which the target class is rare (≤10% of the study region). We limit attention to the case of stratified random sampling with two strata, stratum 1 being the mapped area of the target class with sample size n1 and stratum 2 including all other area with sample size n2 = n - n1. We investigate how n1 changes depending on which estimates are targeted by the optimal allocation. For example, the optimal n1 would differ when the estimated proportion of area is optimized versus when estimates of user's and producer's accuracies are optimized. We compare the standard errors resulting from these different optimal allocations for a diverse set of 80 populations created by all possible combinations of five proportions of area of the rare class (p = 0.001, 0.005, 0.01, 0.05, and 0.10), four user's accuracies (60%, 75%, 85%, and 95%), and four producer's accuracies (60%, 75%, 85%, and 95%). The results indicate that estimating accuracy exerts a stronger impact on the optimal n1 than estimating proportion of area. Larger n1 is advantageous for precise estimation of user's accuracy, but conflicts with the smaller n1 that is optimal for estimating producer's accuracy and proportion of area. The trade-offs among the standard errors of the three estimates resulting from different n1 are magnified as the target class becomes rarer (i.e., p decreases). When deciding between an allocation optimizing estimation of accuracy versus an allocation optimizing estimation of proportion of area, it is precision of estimated user's accuracy that is most strongly impacted by the choice. Conversely, precision of estimated producer's accuracy and precision of estimated proportion of area are relatively insensitive to the decision between these two allocation options. Choosing a sample allocation when multiple estimates are of interest is complex because of the precision trade-offs among the estimates. This article provides quantitative evidence to guide sample allocation decisions given the estimation objectives specified for a particular application.

Full Text