Toward optimal probabilistic active learning using a Bayesian approach

Daniel Kottke,Denis Huseljic,Marek Herde,Christoph Sandrock,Bernhard Sick,Georg Krempl

doi:10.1007/s10994-021-05986-9

Daniel Kottke, Denis Huseljic + Show 4 more

Open Access

https://doi.org/10.1007/s10994-021-05986-9

Copy DOI

Abstract

Gathering labeled data to train well-performing machine learning models is one of the critical challenges in many applications. Active learning aims at reducing the labeling costs by an efficient and effective allocation of costly labeling resources. In this article, we propose a decision-theoretic selection strategy that (1) directly optimizes the gain in misclassification error, and (2) uses a Bayesian approach by introducing a conjugate prior distribution to determine the class posterior to deal with uncertainties. By reformulating existing selection strategies within our proposed model, we can explain which aspects are not covered in current state-of-the-art and why this leads to the superior performance of our approach. Extensive experiments on a large variety of datasets and different kernels validate our claims.

Highlights

To train classifiers with machine learning algorithms in a supervised manner, we need labeled data
We propose a new decision-theoretic selection strategy xPAL, which calculates the gain in performance using a Bayesian approach over a set of unlabeled evaluation instances
We moved toward optimal probabilistic Active learning (AL) by proposing xPAL

Summary

Introduction

To train classifiers with machine learning algorithms in a supervised manner, we need labeled data. Whereas gathering unlabeled instances is easy, the annotation with class labels is often expensive, exhaustive, or time-consuming and needs, to be optimized. Many AL algorithms completely rely on the information provided by the classifier that is to be optimized (Settles 2009). This might lead to problems: Originally, classifiers are designed to get training data that is representative for the learning task. This assumption is not valid for AL tasks, as the distribution of labeled instances is biased by the selection strategy (Dasgupta 2009). The estimates (e. g., the class probabilities) from a classifier are biased, which may lead to a poor assessment of the usefulness of labeling candidates

Objectives

Methods

Findings

Conclusion