Abstract

Biomedical data are widely accepted in developing prediction models for identifying a specific tumour, drug discovery and human cancers detection. However, previous studies usually focused on different classifiers, and overlook the class imbalance problem in real-world biomedical datasets. This paper mainly focuses on reviewing and evaluating some popular and recently developed resampling and feature selection (FS) methods for class imbalance learning with data distribution being considered. Experimental results show that: 1) resampling and FS techniques exhibit better performance using support vector machine (SVM) classifier; 2) techniques such as random undersampling and FS perform better than other data pre-processing methods with T location-scale distribution when using SVM and K-nearest neighbours (KNN) classifiers. Random oversampling outperforms other methods on negative binomial distribution using Random Forest with lower level of imbalance ratio; 3) FS outperforms other data pre-processing methods in most cases, thus, FS with SVM classifier is the best choice for imbalanced biomedical data learning.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call