Empirical investigation of active learning strategies

Davi Pereira-Santos,Ricardo Bastos Cavalcante Prudêncio,André C.P.L.F De Carvalho

doi:10.1016/j.neucom.2017.05.105

Abstract

Abstract Many predictive tasks require labeled data to induce classification models. The data labeling process may have a high cost. Several strategies have been proposed to optimize the selection of the most relevant examples, a process referred to as active learning. However, a lack of empirical studies comparing different active learning approaches across multiple datasets makes it difficult identifying the most promising strategies, or even assessing the relative gain of active learning over the trivial random selection of instances. In this study, a comprehensive comparison of active learning strategies is presented, with various instance selection criteria, different classification algorithms and a large number of datasets. The experimental results confirm the effectiveness of active learning and provide insights about the relationship between classification algorithms and active learning strategies. Additionally, ranking curves with bands are introduced as a means to summarize in a single chart the performance of each active learning strategy for different classification algorithms and datasets.

Full Text