Hyperspectral data and machine learning offer great potential for identifying valuable open ecosystems. Due to the large volume of data, preprocessing of hyperspectral images must involve dimensionality reduction. The main goal of this study was to test the effectiveness of various types of feature reduction (feature selection and feature extraction) when performing classification using the Random Forest algorithm. A comparison was conducted between two ecosystems - heathlands and mires protected as Natura 2000 habitats. Two transformations of feature extraction were chosen, namely Minimum Noise Fraction (MNF) and Principal Component Analysis (PCA), while Linear Discriminant Analysis (LDA) was used as a feature selection method. It was proven that irrespective of the class type, accuracy is higher with the feature extraction method (mean F1 accuracy of 0.922) than with feature selection (mean F1 accuracy of 0.787). At the same time, no significant differences in accuracies were found between the MNF and PCA methods. Although LDA resulted in lower accuracies (0.816 for heathland and 0.750 for mires), the method could also be used due to relatively high F1 values. The effectiveness of the LDA method for feature reduction in open ecosystem identification was confirmed for the first time for open natural vegetation.
Read full abstract