Abstract
Supervised classification methods, used for many applications, including vegetation mapping require accurate “ground truth” to be effective. Nevertheless, it is common for the quality of this data to be poorly verified prior to it being used for the training and validation of classification models. The fact that noisy or erroneous parts of the reference dataset are not removed is usually explained by the relatively high resistance of some algorithms to errors. The objective of this study was to demonstrate the rationale for cleaning the reference dataset used for the classification of heterogeneous non-forest vegetation, and to present a workflow based on the t-distributed stochastic neighbor embedding (t-SNE) algorithm for the better integration of reference data with remote sensing data in order to improve outcomes. The proposed analysis is a new application of the t-SNE algorithm. The effectiveness of this workflow was tested by classifying three heterogeneous non-forest Natura 2000 habitats: Molinia meadows (Molinion caeruleae; code 6410), species-rich Nardus grassland (code 6230) and dry heaths (code 4030), employing two commonly used algorithms: random forest (RF) and AdaBoost (AB), which, according to the literature, differ in their resistance to errors in reference datasets. Polygons collected in the field (on-ground reference data) in 2016 and 2017, containing no intentional errors, were used as the on-ground reference dataset. The remote sensing data used in the classification were obtained in 2017 during the peak growing season by a HySpex sensor consisting of two imaging spectrometers covering spectral ranges of 0.4–0.9 μm (VNIR-1800) and 0.9–2.5 μm (SWIR-384). The on-ground reference dataset was gradually cleaned by verifying candidate polygons selected by visual interpretation of t-SNE plots. Around 40–50% of candidate polygons were ultimately found to contain errors. Altogether, 15% of reference polygons were removed. As a result, the quality of the final map, as assessed by the Kappa and F1 accuracy measures as well as by visual evaluation, was significantly improved. The global map accuracy increased by about 6% (in Kappa coefficient), relative to the baseline classification obtained using random removal of the same number of reference polygons.
Highlights
Reference data used for the supervised classification of vegetation are collected in a number of different ways
The IT0 was performed on the full on-ground reference dataset
The results clearly indicate that several modifications of the on-ground reference dataset, through the iterative t-distributed stochastic neighbor embedding (t-SNE) analysis, were advised to provide a significant, positive impact on the classification results as expressed by the analysis of accuracy measures, the comparison of spectral curves and visual evaluation of the output map
Summary
Reference data used for the supervised classification of vegetation are collected in a number of different ways. Errors in the data may result from various factors, one of which being differences in the methods used for the identification of vegetation type when they come from different sources and when they were collected for different purposes [1]. In this case, the problem may arise from both a different interpretation of a given vegetation type by individual researchers, and the determination of vegetation units at different hierarchy levels (upper hierarchy level = higher level of generality). 2017 with a HySpex sensor developed by the Norwegian Norsk Elektro Optikk (NEO) company. It is part of a remote sensing platform built within the framework of the HabitARS project by the MGGP. The number of flight lines was 25 and the flight orientation was west-east
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.