Abstract

Active learning is a setup that allows the learning algorithm to iteratively and strategically query the labels of some instances for reducing human labeling efforts. One fundamental strategy, called uncertainty sampling, measures the uncertainty of each instance when making querying decisions. Traditional active learning algorithms focus on binary or multiclass classification, but few works have studied active learning for cost-sensitive multiclass classification (CSMCC), which allows charging different costs for different types of misclassification errors. The few works are generally based on calculating the uncertainty of each instance by probability estimation, and can suffer from the inaccuracy of the estimation. In this paper, we propose a novel active learning algorithm that relies on a different way of calculating the uncertainty. The algorithm is based on our newly-proposed cost embedding approach (CE) for CSMCC. CE embeds the cost information in the distance measure of a special hidden space with non-metric multidimensional scaling, and deals with both symmetric and asymmetric cost information by our carefully designed mirroring trick. The embedding allows the proposed algorithm, active learning with cost embedding (ALCE), to define a cost-sensitive uncertainty measure from the distance in the hidden space. Extensive experimental results demonstrate that ALCE selects more useful instances by taking the cost information into account through the embedding and is superior to existing cost-sensitive active learning algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call