Abstract

In the recent past, the volume of spatial datasets has significantly increased. This is attributed to, among other factors, higher sensor temporal resolutions of the recently launched satellites. The increased data, combined with the computation and possible derivation of a large number of indices, may lead to high multi-collinearity and redundant features that compromise the performance of classifiers. Using dimension reduction algorithms, a subset of these features can be selected, hence increasing their predictive potential. In this regard, an investigation into the application of feature selection techniques on multi-temporal multispectral datasets such as Sentinel-2 is valuable in vegetation mapping. In this study, ten feature selection methods belonging to five groups (Similarity-based, statistical-based, Sparse learning based, Information theoretical based, and wrappers methods) were compared based on f-score and data size for mapping a landscape infested by the Parthenium weed (Parthenium hysterophorus). Overall, results showed that ReliefF (a Similarity-based approach) was the best performing feature selection method as demonstrated by the high f-score values of Parthenium weed and a small size of optimal features selected. Although svm-b (a wrapper method) yielded the highest accuracies, the size of optimal subset of selected features was quite large. Results also showed that data size affects the performance of feature selection algorithms, except for statistically-based methods such as Gini-index and F-score and svm-b. Findings in this study provide a guidance on the application of feature selection methods for accurate mapping of invasive plant species in general and Parthenium weed, in particular, using new multispectral imagery with high temporal resolution.

Highlights

  • The dimension space of variables given as input to a classifier can be reduced without an important loss of information, while decreasing its processing time and improving the quality of its output [1]

  • It is noticeable that f-score accuracy of some feature selection methods such as Gini-index and ReliefF, which belong to the statistical-based feature selection methods, and LL-121 were found at smaller feature subsets

  • The results showed that feature selection algorithms could reduce the dimensionality of Sentinel-2 spectral bands combined with vegetation indices

Read more

Summary

Introduction

The dimension space of variables given as input to a classifier can be reduced without an important loss of information, while decreasing its processing time and improving the quality of its output [1]. With the launch of high temporal resolution sensors such as Sentinel-2, the amount of image data that can be acquired within a short period has considerably increased [8]. This is due to the sensor’s improved spectral resolution (13 bands) and a five day temporal resolution [9]. High-dimensional remotely sensed datasets contain irrelevant information and highly redundant features. Such dimensionality deteriorates quantitative (e.g., leaf area index and biomass) and qualitative (e.g., land-cover) performance of statistical algorithms by overfitting data [10]. When using dimension reduction algorithms, a subset of those features can be selected from the high dimensional data, increasing their predictive potential [13]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call