Classification of imbalanced hyperspectral images using SMOTE-based deep learning methods

Akın Özdemir,Kemal Polat,Adi Alhudhaif

doi:10.1016/j.eswa.2021.114986

Abstract

Hyperspectral imaging (HSI) is one of the most advanced methods of digital imaging. This technique differs from RGB images with its wide range of the electromagnetic spectrum. Imbalanced data sets are frequently encountered in machine learning. As a result, the classifier performance may be poor. To avoid this problem, the data set must be balanced. The main motivation in this study is to reveal the difference and effects on the classifier performance between the original imbalanced dataset and the data set modified by balancing methods. In the proposed method, hyperspectral image classification study carried out on Xuzhou Hyspex dataset includes nine-classes including bareland-1, bareland-2, crops-1, crops-2, lake, coals, cement, trees, house-roofs of elements, by using the convolutional neural networks (CNN) and dataset balancing methods comprising the Smote, Adasyn, K-Means, and Cluster. This dataset has been taken from IEEE-Dataport Machine Learning Repository. To classify the hyperspectral image, the convolutional neural networks having different multiclass classification approaches like One-vs-All, One-vs-One. Dataset was splitted in two different ways: %50–%50 Hold-out and 5-Fold Cross-validation. In order to evaluate the performance of the proposed models, the confusion matrix, classification accuracy, precision, recall, and F-Measure have been used. Without the dataset balancing, the obtained classification accuracies are 93.63%, 92.33%, 88.36% for %50–%50 train-test split, and 94.46%, 94%, 92.24% for 5-Fold cross-validation using multi-class classification, One-vs-All, and One-vs-One respectively. After Smote balancing, the obtained classification accuracies are 96.41%, 95.6%, 92.53% for %50–%50 train-test split and 96.49%, 95.64%, 93.38% for 5-Fold cross-validation using multi-class classification, One-vs-All and One-vs-One respectively. After Adasyn balancing, the obtained classification accuracies are 95.86%, 93.62%, 87.05% for %50–%50 train-test split and 96.38%, 95.09%, 91.55% for 5-Fold cross-validation using multi-class classification, One-vs-All and One-vs-One respectively. After K-Means balancing, the obtained classification accuracies are 95.23%, 93.36%, 90.6% for %50–%50 train-test split and 95.74%, 94.72%, 91.94% for 5-Fold cross-validation using multi-class classification, One-vs-All and One-vs-One respectively. After Cluster balancing, the obtained classification accuracies are 94.83%, 94.1%, 90.07% for %50–%50 train-test split and 96.28%, 95.88%, 92.5% for 5-Fold cross-validation using multi-class classification, One-vs-All and One-vs-One respectively. The obtained results have shown that the best model is Smote Balanced 5-CV multiclass classification.

Full Text