Abstract

In remote sensing, the term accuracy typically expresses the degree of correctness of a map. Best practices in accuracy assessment have been widely researched and include guidelines on how to select validation data using probability sampling designs. In practice, however, probability samples may be lacking and, instead, cross-validation using non-probability samples is common. This practice is risky because the resulting accuracy estimates can easily be mistaken for map accuracy. The following question arises: to what extent are accuracy estimates obtained from non-probability samples representative of map accuracy? This letter introduces the T index to answer this question. Certain cross-validation designs (such as the common single-split or hold-out validation) provide representative accuracy estimates when hold-out sets are simple random samples of the map population. The T index essentially measures the probability of a hold-out set of unknown sampling design to be a simple random sample. To that aim, we compare its spread in the feature space against the spread of random unlabelled samples of the same size. Data spread is measured by a variant of Moran’s I autocorrelation index. Consistent interpretation of the T index is proposed through the prism of significance testing, with T values < 0.05 indicating unreliable accuracy estimates. Its relevance and interpretation guidelines are also illustrated in a case study on crop-type mapping. Uptake of the T index by the remote-sensing community will help inform about—and sometimes caution against—the representativeness of accuracy estimates obtained by cross-validation, so that users can better decide whether a map is fit for their purpose or how its accuracy impacts their application. Subsequently, the T index will build trust and improve the transparency of accuracy assessment in conditions which deviate from best practices.

Highlights

  • Protocols on how to collect reliable validation data to assess the accuracy of maps derived from remotely-sensed data have been established since the early days of the discipline [1]

  • While it is well known that the accuracy estimates obtained in this fashion are not representative of map accuracy, there is currently no method to evaluate how much these differ from one another

  • It is directly interpretable as the probability of a validation set to be a simple random sample of the map population, an assumption that must be verified for cross-validation to provide representative estimates of map accuracy with hold-out cross-validation

Read more

Summary

Introduction

Protocols on how to collect reliable validation data to assess the accuracy of maps derived from remotely-sensed data have been established since the early days of the discipline [1]. Among other things, on the characteristics of the training data It positively correlates with sample size but it is affected by the presence of outliers and imbalance among classes [7,8,9]. Given the costs associated with data collection, it is of value to reduce the training set size without decreasing accuracy [10] and to identify where to collect data so that accuracy is maximised and costs are minimised [11]. The findings of such studies can inform guidelines for collecting training data

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call