Bonsai: diverse and shallow trees for extreme multi-label classification

Sujay Khandagale,Rohit Babbar,Han Xiao

doi:10.1007/s10994-020-05888-2

Sujay Khandagale, Rohit Babbar + Show 1 more

Open Access

https://doi.org/10.1007/s10994-020-05888-2

Copy DOI

Journal: Machine Learning	Publication Date: Aug 23, 2020
Citations: 72	License type: open-access

Affiliation: Aalto University

Abstract

Extreme multi-label classification (XMC) refers to supervised multi-label learning involving hundreds of thousands or even millions of labels.In this paper, we develop a suite of algorithms, called Bonsai, which generalizes the notion of label representation in XMC, and partitions the labels in the representation space to learn shallow trees.We show three concrete realizations of this label representation space including: (i) the input space which is spanned by the input features, (ii) the output space spanned by label vectors based on their co-occurrence with other labels, and (iii) the joint space by combining the input and output representations. Furthermore, the constraint-free multi-way partitions learnt iteratively in these spaces lead to shallow trees.By combining the effect of shallow trees and generalized label representation, Bonsai achieves the best of both worlds—fast training which is comparable to state-of-the-art tree-based methods in XMC, and much better prediction accuracy, particularly on tail-labels. On a benchmark Amazon-3M dataset with 3 million labels, Bonsai outperforms a state-of-the-art one-vs-rest method in terms of prediction accuracy, while being approximately 200 times faster to train. The code for Bonsai is available at https://github.com/xmc-aalto/bonsai.

Highlights

Extreme Multi-label Classification (XMC) refers to supervised learning of a classifier which can automatically label an instance with a small subset of relevant labels from an extremely large set of all possible target labels
Our work generalizes the approach taken in many earlier works, which have represented labels only in the input space (Prabhu et al 2018; Wydmuch et al 2018), or only in the output space (Tsoumakas et al 208). We show that these representations, when combined with shallow trees, surpass existing methods demonstrating the efficacy of the proposed generalized representation
– The consistent improvement of Bonsai over Parabel on all datasets validates the choice of higher fanout and advantages of using shallow trees

Summary

Introduction

Extreme Multi-label Classification (XMC) refers to supervised learning of a classifier which can automatically label an instance with a small subset of relevant labels from an extremely large set of all possible target labels. From the machine learning perspective, building effective extreme classifiers is faced with the computational challenge arising due to large number of (i) output labels, (ii) input training instances, and (iii) input features. Another important statistical characteristic of the datasets in XMC is that a large fraction of labels are tail labels, i.e., those which have very few training instances that belong to them ( referred to as power-law, fat-tailed distribution and Zipf’s law).

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Bonsai: diverse and shallow trees for extreme multi-label classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Learning

Lead the way for us

Similar Papers

Extreme multi-label learning : A large scale classification approach in machine learning
Purvi Prajapati ... Amit Thakkar
Journal of Information and Optimization Sciences | VOL. 40
Purvi Prajapati, et. al.Purvi Prajapati ... Amit Thakkar
19 May 2019
Journal of Information and Optimization Sciences | VOL. 40

Chapter 1 - Fodorian Modularity and Representational Modularity
Ray Jackendoff
Language and the Brain | VOL. -
Ray JackendoffRay Jackendoff
01 Jan 1999
Language and the Brain | VOL. -

A feature mapping strategy of metamodelling for nonlinear stochastic dynamical systems with low to high-dimensional input uncertainties
Zhiqiang Wan ... Zhongming Jiang
Mechanical Systems and Signal Processing | VOL. 184
Zhiqiang Wan, et. al.Zhiqiang Wan ... Zhongming Jiang
23 Aug 2022
Mechanical Systems and Signal Processing | VOL. 184

Sparse Backdoor Attack Against Neural Networks
Nan Zhong ... Xinpeng Zhang
The Computer Journal | VOL. -
Nan Zhong, et. al.Nan Zhong ... Xinpeng Zhang
05 Oct 2023
The Computer Journal | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bonsai: diverse and shallow trees for extreme multi-label classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Learning