Transductive active learning – A new semi-supervised learning approach based on iteratively refined generative models to capture structure in data

Tobias Reitmaier,Adrian Calma,Bernhard Sick

doi:10.1016/j.ins.2014.09.009

Abstract

Pool-based active learning is a paradigm where users (e.g., domains experts) are iteratively asked to label initially unlabeled data, e.g., to train a classifier from these data. An appropriate selection strategy has to choose unlabeled data for such user queries in an efficient and effective way (in principle, high classification performance at low labeling costs). In our transductive active learning approach we provide a completely labeled data pool (samples are either labeled by the experts or in a semi-supervised way) in each active learning cycle. Thereby, a key aspect is to explore and exploit information about structure in data. Structure in data can be detected and modeled by means of clustering algorithms or probabilistic, generative modeling techniques, for instance. Usually, this is done at the beginning of the active learning process when the data are still unlabeled. In our approach we show how a probabilistic generative model, initially parametrized with unlabeled data, can iteratively be refined and improved when during the active learning process more and more labels became available. In each cycle of the active learning process we use this generative model to label all samples not labeled by an expert so far in order to train the kind of classifier we want to train with the active learning process. Thus, this transductive learning process can be combined with any selection strategy and any kind of classifier. Here, we combine it with the 4DS selection strategy and the CMM probabilistic classifier described in previous work. For 20 publicly available benchmark data sets, we show that this new transductive learning process helps to improve pool-based active learning noticeably.

Full Text