Abstract

Label imbalance is one of the characteristics of multilabel data, and imbalanced data seriously affects the performance of the classifiers. In multilabel classification, resampling methods are mostly used to deal with imbalanced problems. Existing resampling methods balance the data by either undersampling or oversampling, which causes overfitting and information loss. Resampling has a significant impact on the minority labels. Furthermore, the high concurrency of majority labels and minority labels in many instances also affects the performance of classification. In this study, we proposed a bidirectional resampling method to decouple multilabel datasets. On one hand, the concurrency of labels can be reduced by setting termination conditions for decoupling, and on the other hand, the loss of instance information and overfitting can be alleviated by combining oversampling and undersampling. By measuring the minority labels of the instances, the instances that have less impact on minority labels are selected to resample. The number of resampling is limited to keep the original distribution of the data during the resampling phase. The experiments on seven benchmark multilabel datasets have proved the effectiveness of the algorithm, especially on datasets with high concurrency of majority labels and minority labels.

Highlights

  • With the advent of the era of big data, data classification has received much attention in recent years. e imbalanced data often occurs in the field of data classification, including medical data

  • In the field of tumor classification, nontumor patients are the majority class, while tumor patients are the minority class [1], but we are more concerned about the minority of tumor patients. ese problems exist in the fields of medical imaging classification, credit card fraud [2] detection, and network intrusion identification, etc

  • Multilabel Decoupling Bidirectional Resampling algorithm (ML-DBR) calculates the SCUMBLEIns value for each instance in the dataset, sets the initial SCUMBLE(D) of the dataset as SCUMBLE(D)1, and decouples the instances that meet the requirements according to the SCUMBLE(D)1, so as to reduce the instances with highly concurrent labels. if SCUMBLEIns(i) > SCUMBLE(D)1, clone the instance Di as D′i, Li is the label set of Di, L′i is the label set of D′i, L′i Li[Imbalance Ratio per Label (IR)(y)≥Mean Imbalance Ratio (MeanIR)], Li Li [IR(y)≤MeanIR]. en, when every 1% of the instances in the dataset are decoupled, the SCUMBLE(D) of the uncoupled dataset is recalculated

Read more

Summary

Research Article

A Decoupling and Bidirectional Resampling Method for Multilabel Classification of Imbalanced Data with Label Concurrence. Label imbalance is one of the characteristics of multilabel data, and imbalanced data seriously affects the performance of the classifiers. Existing resampling methods balance the data by either undersampling or oversampling, which causes overfitting and information loss. The high concurrency of majority labels and minority labels in many instances affects the performance of classification. The concurrency of labels can be reduced by setting termination conditions for decoupling, and on the other hand, the loss of instance information and overfitting can be alleviated by combining oversampling and undersampling. E experiments on seven benchmark multilabel datasets have proved the effectiveness of the algorithm, especially on datasets with high concurrency of majority labels and minority labels By measuring the minority labels of the instances, the instances that have less impact on minority labels are selected to resample. e number of resampling is limited to keep the original distribution of the data during the resampling phase. e experiments on seven benchmark multilabel datasets have proved the effectiveness of the algorithm, especially on datasets with high concurrency of majority labels and minority labels

Introduction
Results and Discussion
Value Value
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call