Abstract

Convolutional neural network (CNN)-based deep learning (DL) is a powerful, recently developed image classification approach. With origins in the computer vision and image processing communities, the accuracy assessment methods developed for CNN-based DL use a wide range of metrics that may be unfamiliar to the remote sensing (RS) community. To explore the differences between traditional RS and DL RS methods, we surveyed a random selection of 100 papers from the RS DL literature. The results show that RS DL studies have largely abandoned traditional RS accuracy assessment terminology, though some of the accuracy measures typically used in DL papers, most notably precision and recall, have direct equivalents in traditional RS terminology. Some of the DL accuracy terms have multiple names, or are equivalent to another measure. In our sample, DL studies only rarely reported a complete confusion matrix, and when they did so, it was even more rare that the confusion matrix estimated population properties. On the other hand, some DL studies are increasingly paying attention to the role of class prevalence in designing accuracy assessment approaches. DL studies that evaluate the decision boundary threshold over a range of values tend to use the precision-recall (P-R) curve, the associated area under the curve (AUC) measures of average precision (AP) and mean average precision (mAP), rather than the traditional receiver operating characteristic (ROC) curve and its AUC. DL studies are also notable for testing the generalization of their models on entirely new datasets, including data from new areas, new acquisition times, or even new sensors.

Highlights

  • The importance of assessment of the accuracy of remote sensing (RS) thematic classification has been recognized since the early days of remote sensing [1,2,3,4,5,6,7,8,9,10,11]

  • We reviewed 100 randomly-selected papers focusing on deep learning (DL) classification that were published in eight major RS journals in 2020

  • The review of these papers confirms that the RS DL community have largely abandoned traditional RS accuracy assessment terminology

Read more

Summary

Introduction

The importance of assessment of the accuracy of remote sensing (RS) thematic classification has been recognized since the early days of remote sensing [1,2,3,4,5,6,7,8,9,10,11]. There is a general consensus regarding the importance of unbiased, randomized sampling to support the generation of summary accuracy data, normally presented in the form of a table called the confusion matrix, or error matrix. This table forms the basis for calculating summary metrics, most commonly the overall accuracy (OA), the Kappa statistic (though the use of this statistic has been challenged [12,13]), and the class-specific statistics of user’s (UA) and producer’s accuracy (PA). Note that that spatial spatial context, context, spectral information, edge, and textural information is highlighted by different spectral information, edge, and textural information is highlighted by different filters filters and and different ofof data abstraction This modeling of different convolutional convolutionallayers, layers,allowing allowinga ahigh highdegree degree data abstraction.

Example
Traditional Remote Sensing Accuracy Evaluation
The Purpose of Accuracy Assessment
Deep Learning Accuracy Assessment Example Use Cases
The Confusion Matrix
Summary Metrics Derived from Confusion Matrix
The Binary Confusion Matrix
The Multiclass DL CNN Confusion Matrix
Literature
Overall Accuracy and Kappa
Recall and Precision
Specificity and Negative Predictive Value
Balanced Accuracy and Matthews Correlation Coefficient
Receiver
Comparison of DL and Traditional RS Approaches to Accuracy Assessment
Clarity in Terminology
Findings
Class Prevalence and Imbalance
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call