Abstract

In multi-label classification, an instance may be associated with multiple labels simultaneously and thus the class labels are correlated rather than exclusive one another. As various applications emerge, besides large instance size and high feature dimensionality, the dimensionality of label space also grows quickly, which would increase computational costs and even deteriorate classification performance. To this end, dimensionality reduction strategy is applied to label space via exploiting label correlation information, resulting in label embedding and label selection techniques. Compared with a lot of label embedding work, less attention has been paid to label selection research due to its difficulty. Therefore, it is a challenging task to design more effective label selection techniques for multi-label classification. Boolean matrix decomposition (BMD) finds two low-rank binary matrix Boolean multiplication to approximate the original binary matrix. Further, Boolean interpolative decomposition (BID) version specially forces the left low-rank matrix to be a column subset of original ones, which implies to choose some informative binary labels for multi-label classification. Since BID is an NP-hard problem, it is necessary to find out a more effective heuristic solution method. In this paper, after executing exact BMD which achieves an exact approximation via removing a few uninformative labels, we apply sequential backward selection (SBS) strategy to delete some less informative labels one by one, to detect a fixed-size column subset. Our work builds a novel label selection algorithm based on BID with SBS. This proposed method is experimentally verified through six benchmark data sets with more than 100 labels, according to two performance metrics (precision@n and discounted gain@n, n = 1, 3 and 5) for high-dimensional label situation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call