Abstract

AbstractA common situation in classification tasks is to deal with unbalanced datasets, an issue that appears when the majority class(es) has a large number of samples compared to the minority class(es). This problem is even more significant when the datasets have a large number of features but only a few samples, as is the case with microarray datasets. Traditionally, an approach to alleviate this problem has been the application of sampling methods to obtain more balanced classes, increasing the number of samples in the minority class (replicating samples or generating new synthetic samples), or decreasing the number of samples in the majority class. In this study, we have compared different balancing methods, including a novel method that applies sampling in both the minority and majority classes. The interest in applying feature selection in combination with balancing methods has also been explored. In view of the results, a recommendation of sampling method, feature selection, and classifier is proposed to improve the classification results according to the type of dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call