Approximate block coordinate descent for large scale hierarchical classification

Anveshi Charuvaka,Huzefa Rangwala

doi:10.1145/2695664.2695755

Abstract

In real world, we often encounter hierarchical classification problems with large number of categories and deep hierarchies. In addition, majority of the categories do not have sufficient examples for training classifiers with good generalization performance. Usually, the feature space is also large, and especially so for text classification problems. Binary, multi-class, or multi-label classification approaches that treat the hierarchical classification as a flat classification problem, disregarding the hierarchical relationships, fail to leverage the relatedness of the categories in the learning process and, consequently, perform poorly. Several approaches for hierarchical classification have been proposed in literature, but a majority of them are not sufficiently scalable to address large scale classification problems. In this paper, we study a hierarchical classification method that addresses large scale classification problem within regularized risk minimization framework. Specifically, the method studied here exploits hierarchical relationships between categories by imposing the constraint that the learned model vectors for a category should be similar to its parent category. We study and analyze an approximate block coordinate descent procedure and compare its performance to a previously proposed exact coordinate descent method for this problem. We further examine the performance of this method on various aspects of the hierarchical classification problem on large hierarchical text classification datasets.

Full Text