Abstract

Data imbalance refers to the unequal distribution of classes within a dataset that directly affects the accuracy of machine learning classification algorithms. Although many resampling techniques have been proposed by researchers, learning from imbalanced data is still considered one of the contemporary challenges. The class imbalanced problem has been complicated as most of the existing techniques don't manage the similarity relationships between minority and majority classes well. In addition, due to the complex relationships among classes, most of the existing techniques do not focus on retaining valuable samples in the majority class(es) properly. In this article, a salp swarm optimization-based under-sampling technique (SSBUT) is proposed to address data class imbalance problems. Utilizing the proposed SSBUT, the similarity relationship among the samples of the majority class is well analyzed, and the samples that do not affect the accuracy of the classification algorithm are eliminated from the majority class. The performance of the proposed SSBUT has been tested on benchmark medical imbalanced datasets and the obtained results have been compared with state-of-the-art under-sampling techniques. The experimental results show that the proposed SSBUT consistently outperformed the state-of-the-art under-sampling techniques in terms of various evaluation criteria.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.