Optimal batch selection for active learning in multi-label classification

Shayok Chakraborty,Sethuraman Panchanathan,Vineeth Balasubramanian

doi:10.1145/2072298.2072028

Abstract

Multi-label classification is a generalization of conventional classification, where it is possible for a single data point to have multiple labels. Manual annotation of a multi-label data point requires a human oracle to consider the presence/absence of every possible class separately, which involves significant labor. Active learning techniques are effective in reducing human labeling effort to induce a classification model. When exposed to large quantities of unlabeled data, such algorithms automatically select the salient and representative instances for manual annotation. Further, to address the high redundancy in data such as image or video sequences as well as the availability of multiple labeling agents, there have been recent attempts towards a batch mode form of active learning, where a batch of data points is selected simultaneously from an unlabeled set. In this work, we propose a novel optimization based batch mode active learning strategy to minimize human labeling effort in multi-label classification problems. To the best of our knowledge, this is the first attempt to develop such a scheme primarily intended for the multi-label context. The proposed framework is computationally simple, easy to implement and can be suitably modified to perform batch mode active learning in other formulations, such as single-label classification or problems involving hierarchical label spaces. Our results corroborate the efficacy of the proposed algorithm and certify the potential of the framework in being used for real world applications.

Full Text