Abstract

Regression problems frequently occur in the surrounding world, therefore are unavoidable in real-world applications. However, to obtain a model with desired generalization performance, a vast amount of labels usually is required. In many scenarios, obtaining unlabelled data is relatively inexpensive, therefore active learning approaches may be used to reduce the needed annotation effort. Most uncertainty-based regression active learning algorithms use variance estimation of model predictions to choose informative samples. Those algorithms do not incorporate knowledge about the data distribution for the given task. In this paper, we propose a novel algorithm to incorporate information about data distribution and combine it with variance estimation as an informativeness function. Experiments conducted on four data sets show that the proposed approach outperforms standard variance-based sampling by a margin, and indicate its robustness.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call