An Explorative Study on Estimating Local Accuracies in Land-Cover Information Using Logistic Regression and Class-Heterogeneity-Stratified Data

Jingxiong Zhang,Wenjing Yang,Wangle Zhang,Yingchang Xiu,Di Liu,Yu Wang

doi:10.3390/rs10101581

Abstract

It is increasingly recognized that classification accuracy should be characterized locally at the level of individual pixels to depict its spatial variability to better inform users and producers of land-cover information than by conventional error-matrix-based methods. Local or per-pixel accuracy is usually estimated through empirical modelling, such as logistic regression, which often proceeds in a class-aggregated or a class-stratified way, with the latter being generally more accurate due to its accommodation for between-class inhomogeneity in accuracy-context relations. As an extension to class-stratified modelling, class-heterogeneity-stratified modelling, in which logistic models are built separately for contextually heterogeneous vs. homogeneous sub-strata in individual strata of map classes, is proposed in this paper for proper handling of within-class inhomogeneity in accuracy-context relations to increase accuracy of estimation. Unlike in existing literature where sampling is usually approached separately, the double-stratification method is also adopted in sampling design so that more sample data are likely allocated to heterogeneous sub-strata (which are more prone to misclassifications than homogeneous ones). This class-heterogeneity-stratified method furnished for sampling and modelling jointly thus constitutes an integrative framework for accuracy estimation and information refinement. As the first step in building up such a framework, this paper investigates the proposed double-stratification method’s performance and sensitivity to sample size regarding local accuracy estimation in comparison with those of existing methods through a case study concerning Globeland30 2010 land cover over Wuhan, China. A detailed review of existing methods for analyses, estimation, and use of local accuracy was provided, helping to put the proposed research in a broader context. Candidate explanatory variables for logistic regression included sample pixels’ map classes, positions, and contextual features that were computed in different-sized moving windows. Relative performances of these methods were evaluated based on an independent reference sample, with all methods found reliable. It was confirmed that the proposed method is in general the most accurate, as observed with varying sample sizes. The proposed method’s competitive performance is thus proved, reinforcing its potential for information refinement. Extensions to and uncertainty aspects of the proposed method were discussed, with further research proposed.

Highlights

Land-cover information is important for resource management and environmental modelling
Local accuracy characterization for land-cover information products is important for users and producers alike
EO is proposed for sampling and modelling in the context local accuracy estimation. This method was compared with three alternative methods (Methods EI, CS, and CA) based on GlobeLand30 2010 land cover over Wuhan

Summary

Introduction

Land-cover information is important for resource management and environmental modelling. A variety of land-cover information products (static and dynamic, crisp and soft) are generated from different sensor datasets at regional and global scales [1,2,3,4,5,6]. This research focuses on static land-cover information coded with discrete class labels rather than percent covers (or fractional covers or class proportions). Land-cover information is always inaccurate to some extent. There are increasing research efforts directed towards describing, quantifying, and analyzing accuracies (or misclassification errors) in land-cover information [7,8,9,10,11,12,13]

Methods

Findings

Discussion

Conclusion