Abstract

Multi-label data classification has become an important and active research topic, where the classification algorithm is required to deal with prediction of sets of label indicators for instances simultaneously. Label powerset (LP) method reduces the multi-label classification problem to a single-label multi-class classification problem by treating each distinct combination of labels. However, the predictive performance of LP is challenged with imbalanced distribution among the labelsets, deteriorating the performance of traditional classifiers. In this paper, we study the problem of multi-label imbalanced data classification and propose a novel solution, called CSRankSVM (Cost sensitive Ranking Support Vector Machine), which assigns a different misclassification cost for each labelset to effectively tackle the problem of imbalance for Multi-label data. Empirical studies on popular benchmark datasets with various imbalance ratios of labelsets demonstrate that the proposed CSRankSVM approach can effectively boost classification performances in multi-label datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call