Evolutionary simultaneous under and oversampling of instances for dealing with class-imbalance datasets in multilabel problems

Nicolás García-Pedrajas,José M Cuevas-Muñoz,Aida De Haro-García

doi:10.1016/j.asoc.2024.111618

Abstract

Multilabel classification has recently attracted great attention from the data mining research community. Multilabel classification is concerned with learning where each instance can be associated with multiple classes (or labels). Class-imbalance problems appear in any classification task when the class distribution of the instances is very different. In multilabel classification, this problem is ubiquitous, as a large percentage of labels suffer from a class-imbalanced distribution. The adaptation of single-label methods to deal with the class-imbalance problem in multilabel learning is problematic as many of their basic concepts are not easily transferred. In this paper, we propose the use of evolutionary computation to simultaneously oversample the minority class and undersample the majority class for multilabel problems. Letting the algorithm autonomously select the instances to undersample and oversample allows us to extend these two successful paradigms to the multilabel task. An extensive comparison setup of 35 datasets shows the advantages of using this approach to deal with class-imbalance datasets for multilabel problems compared with previously published methods as well as the basic classification algorithms with the original datasets.

Full Text