Abstract

Hierarchical multi-label classification (HMC) is a practically relevant machine learning task with applications ranging from text categorization, image annotation and up to functional genomics. State of the art results for HMC are obtained with ensembles of predictive models, especially ensembles of predictive clustering trees. Predictive clustering trees (PCTs) generalize decision trees towards HMC and can be combined into ensembles using techniques such as bagging and random forests. There are two major issues that influence the performance of HMC methods: (1) the computational bottleneck imposed by the size of the label hierarchy that can easily reach tens of thousands of labels, and (2) the sparsity of annotations in the label/output space. To address these limitations, we propose an approach that combines graph node embeddings and a specific property of PCTs (descriptive, clustering and target attributes can be specified arbitrarily). We adapt Poincare hyperbolic node embeddings to obtain low dimensional label set embeddings, which are then used to guide PCT construction instead of the original label space. This greatly reduces the time needed to construct a tree due to the difference in dimensionality. The input and output space remain the same: the tests in the tree use original attributes, and in the leaves the original labels are predicted directly. We empirically evaluate the proposed approach on 9 datasets. The results show that our approach dramatically reduces the computational cost of learning and can lead to improved predictive performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call