Ranked batch-mode active learning

Thiago N.C Cardoso,Rodrigo M Silva,Sérgio Canuto,Mirella M Moro,Marcos A Gonçalves

doi:10.1016/j.ins.2016.10.037

Abstract

We introduce a new paradigm for Ranked Batch-Mode Active Learning. It relaxes traditional Batch-Mode Active Learning (BMAL) methods by generating a query whose answer is an optimized ranked list of instances to be labeled, according to some quality criteria, allowing batches to be of arbitrarily large sizes. This new paradigm avoids the main problem of traditional BMAL, namely the frequent stops for manual labeling, reconciliation and model reconstruction. In this article, we formally define this problem and introduce a framework that iteratively and effectively builds the ranked list. Our experimental evaluation shows our proposed Ranked Batch approach significantly reduces the number of algorithm executions (and, consequently, the manual labeling delays) while maintaining or even improving the quality of the selected instances. In fact, when using only unlabeled data, our results are much better than those produced by pool-based batch-mode active learning methods that rely on already labeled seeds or update their models with labeled instances, with gains of up to 25% in MacroF1. Finally, our solutions are also more effective than density-sensitive active learning methods in most of the envisioned scenarios, as demonstrated by our experiments.

Full Text