A Salp Swarm-Based Under-Sampling Approach for Medical Imbalanced Data Classification

Mohammed Hussein Ibrahim

doi:10.31590/ejosat.1082451

Abstract

Data imbalance refers to the unequal distribution of classes within a dataset that directly affects the accuracy of machine learning classification algorithms. Although many resampling techniques have been proposed by researchers, learning from imbalanced data is still considered one of the contemporary challenges. The class imbalanced problem has been complicated as most of the existing techniques don't manage the similarity relationships between minority and majority classes well. In addition, due to the complex relationships among classes, most of the existing techniques do not focus on retaining valuable samples in the majority class(es) properly. In this article, a salp swarm optimization-based under-sampling technique (SSBUT) is proposed to address data class imbalance problems. Utilizing the proposed SSBUT, the similarity relationship among the samples of the majority class is well analyzed, and the samples that do not affect the accuracy of the classification algorithm are eliminated from the majority class. The performance of the proposed SSBUT has been tested on benchmark medical imbalanced datasets and the obtained results have been compared with state-of-the-art under-sampling techniques. The experimental results show that the proposed SSBUT consistently outperformed the state-of-the-art under-sampling techniques in terms of various evaluation criteria.

Full Text