Abstract

Multi-label data classification has received much attention due to its wide range of application domains. Unfortunately, a class imbalance problem often occurs in multi-label datasets, causing challenges for classification algorithms. Oversampling is one of the most important approaches, as it generates minority label instances to balance the class distribution. However, existing oversampling methods ignore existing label correlations, resulting in the generation of inappropriate synthetic minority samples and making multi-label data classification tasks harder. In this work, we propose an oversampling method that considers label correlations and identifies two critical boundary regions for generating synthetic minority samples. Moreover, we propose a weighting strategy to assign weights to these instances based on their distance information. To evaluate the performance of our proposed method, we conducted experiments on sixteen public datasets. The results show that our approach outperforms the state-of-the-art approaches in terms of various assessment metrics, such as Macro F1 and Macro AUC. The code is available at https://github.com/IntelliDAL/Multi-label/tree/main/LCOS.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call