Abstract

This article proposes a binary tree of classifiers for multi-label classification that preserves label dependencies and handles class imbalance. At each node, the input data is strategically split into two subsets for its subsequent child nodes, keeping the label correlations intact. A novel approach of partitioning the data based on label-set proximity has also been proposed. Various data appropriate classifiers are trained to learn the binary partition at every internal node. The tree of classifiers grows iteratively depending on two parameters – multi-label entropy and sample cardinality, computed on the data at the current node. During training, the decision at any node is based on these parameters and the branching out is restricted, if deemed unnecessary. Specific classifiers at the leaf nodes perform the final classification task and assign appropriate label-sets to the unlabelled data. The proposed system aims to appropriately split the data and build the hierarchical structure such that the training and classification tasks become simpler. Also, the problem of class imbalance leads to the irregular splitting of data and excessive branching out of the tree which is handled through the novel use of suitable classifiers and parameters at the intermediate and leaf nodes. The proposed method has shown significant performance improvement on fourteen datasets against fourteen existing multi-label classifiers. Two-tailed Wilcoxon signed rank test statistics show that for <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$T_{Wilcoxon}(14,0.2)=31$</tex-math></inline-formula> the proposed method outperforms all the other comparison models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call