Abstract

The classification with instances which can be tagged with any of the 2L possible subsets from the predefined L labels is called multi-label classification. Multi-label classification is commonly applied in domains, such as multimedia, text, web and biological data analysis. The main challenge lying in multi-label classification is the dilemma of optimising label correlations over exponentially large label powerset and the ignorance of label correlations using binary relevance strategy (1-vs-all heuristic). The classification with label powerset usually encounters with highly skewed data distribution, called imbalanced problem. While binary relevance strategy reduces the problem from exponential to linear, it totally neglects the label correlations. In this artical, we propose a novel strategy of introducing Balanced Pseudo-Labels (BPL) which build more robust classifiers for imbalanced multi-label classification, which embeds imbalanced data in the problems innately. By incorporating the new balanced labels we aim to increase the average distances among the distinct label vectors. In this way, we also code the label correlation implicitly in the algorithm. Another advantage of the proposed method is that it can combined with any classifier and it is proportional to linear label transformation. In the experiment, we choose five multi-label benchmark data sets and compare our algorithm with the most state-of-art algorithms. Our algorithm outperforms them in standard multi-label evaluation in most scenarios.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call