The t-SNE Algorithm as a Tool to Improve the Quality of Reference Data Used in Accurate Mapping of Heterogeneous Non-Forest Vegetation

Anna Halladin-Dąbrowska,Adam Kania,Dominik Kopeć

doi:10.3390/rs12010039

Abstract

Supervised classification methods, used for many applications, including vegetation mapping require accurate “ground truth” to be effective. Nevertheless, it is common for the quality of this data to be poorly verified prior to it being used for the training and validation of classification models. The fact that noisy or erroneous parts of the reference dataset are not removed is usually explained by the relatively high resistance of some algorithms to errors. The objective of this study was to demonstrate the rationale for cleaning the reference dataset used for the classification of heterogeneous non-forest vegetation, and to present a workflow based on the t-distributed stochastic neighbor embedding (t-SNE) algorithm for the better integration of reference data with remote sensing data in order to improve outcomes. The proposed analysis is a new application of the t-SNE algorithm. The effectiveness of this workflow was tested by classifying three heterogeneous non-forest Natura 2000 habitats: Molinia meadows (Molinion caeruleae; code 6410), species-rich Nardus grassland (code 6230) and dry heaths (code 4030), employing two commonly used algorithms: random forest (RF) and AdaBoost (AB), which, according to the literature, differ in their resistance to errors in reference datasets. Polygons collected in the field (on-ground reference data) in 2016 and 2017, containing no intentional errors, were used as the on-ground reference dataset. The remote sensing data used in the classification were obtained in 2017 during the peak growing season by a HySpex sensor consisting of two imaging spectrometers covering spectral ranges of 0.4–0.9 μm (VNIR-1800) and 0.9–2.5 μm (SWIR-384). The on-ground reference dataset was gradually cleaned by verifying candidate polygons selected by visual interpretation of t-SNE plots. Around 40–50% of candidate polygons were ultimately found to contain errors. Altogether, 15% of reference polygons were removed. As a result, the quality of the final map, as assessed by the Kappa and F1 accuracy measures as well as by visual evaluation, was significantly improved. The global map accuracy increased by about 6% (in Kappa coefficient), relative to the baseline classification obtained using random removal of the same number of reference polygons.

Highlights

Reference data used for the supervised classification of vegetation are collected in a number of different ways
The IT0 was performed on the full on-ground reference dataset
The results clearly indicate that several modifications of the on-ground reference dataset, through the iterative t-distributed stochastic neighbor embedding (t-SNE) analysis, were advised to provide a significant, positive impact on the classification results as expressed by the analysis of accuracy measures, the comparison of spectral curves and visual evaluation of the output map

Summary

Introduction

Reference data used for the supervised classification of vegetation are collected in a number of different ways. Errors in the data may result from various factors, one of which being differences in the methods used for the identification of vegetation type when they come from different sources and when they were collected for different purposes [1]. In this case, the problem may arise from both a different interpretation of a given vegetation type by individual researchers, and the determination of vegetation units at different hierarchy levels (upper hierarchy level = higher level of generality). 2017 with a HySpex sensor developed by the Norwegian Norsk Elektro Optikk (NEO) company. It is part of a remote sensing platform built within the framework of the HabitARS project by the MGGP. The number of flight lines was 25 and the flight orientation was west-east

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Remote Sensing	Publication Date: Dec 20, 2019
Citations: 14	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

The t-SNE Algorithm as a Tool to Improve the Quality of Reference Data Used in Accurate Mapping of Heterogeneous Non-Forest Vegetation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Remote Sensing

Lead the way for us

Similar Papers

Feature Analysis and Optimization of Underwater Target Radiated Noise Based on t-SNE
Yuechao Chen ... Shuanping Du
-
Yuechao Chen, et. al.Yuechao Chen ... Shuanping Du
01 Oct 2018
01 Oct 2018

Automatic clustering method of flow cytometry data based on t-distributed stochastic neighbor embedding
Xiaochen Meng ... Lianqing Zhu
Sheng wu yi xue gong cheng xue za zhi = Journal of biomedical engineering = Shengwu yixue gongchengxue zazhi | VOL. 35
Xiaochen Meng, et. al.Xiaochen Meng ... Lianqing Zhu
25 Oct 2018
Sheng wu yi xue gong cheng xue za zhi = Journal of biomedical engineering = Shengwu yixue gongchengxue zazhi | VOL. 35

Multi-Element Correlation Analysis of Cu-bearing Tourmaline using LA-ICP-Time-Of-Flight-MS
Hao A.O Wang ... Sarah Degen
-
Hao A.O Wang, et. al.Hao A.O Wang ... Sarah Degen
04 Mar 2021
04 Mar 2021

Predicting Alloying Element Yield in Converter Steelmaking Using t-SNE-WOA-LSTM
Xin Liu ... Lihua Zhao
Processes | VOL. 12
Xin Liu, et. al.Xin Liu ... Lihua Zhao
10 May 2024
Processes | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The t-SNE Algorithm as a Tool to Improve the Quality of Reference Data Used in Accurate Mapping of Heterogeneous Non-Forest Vegetation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Remote Sensing