Abstract

In this article, we introduce and investigate 3DS, a novel selection strategy for pool-based active training of a generative classifier, namely CMM (classifier based on a probabilistic mixture model). Such a generative classifier aims at modeling the processes underlying the “generation” of the data. The strategy 3DS considers the distance of samples to the decision boundary, the density in regions where samples are selected, and the diversity of samples in the query set that are chosen for labeling, e.g., by a human domain expert. The combination of the three measures in 3DS is adaptive in the sense that the weights of the distance and the density measure depend on the uniqueness of the classification. With nine benchmark data sets it is shown that 3DS outperforms a random selection strategy (baseline method), a pure closest sampling approach, ITDS (information theoretic diversity sampling), DWUS (density-weighted uncertainty sampling), DUAL (dual strategy for active learning), and PBAC (prototype based active learning) regarding evaluation criteria such as ranked performance based on classification accuracy, number of labeled samples (data utilization), and learning speed assessed by the area under the learning curve.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call