Pool-based active learning with optimal sampling distribution and its information geometrical interpretation

Takafumi Kanamori

doi:10.1016/j.neucom.2006.11.024

Abstract

We propose a pool-based active learning algorithm with approximately optimal sampling distributions. An intuitive understanding of the effectiveness of active learning is also illustrated from the viewpoint of the information geometry. In active learning, one can choose informative input points or input distributions. Appropriate choice of data points is expected in order to make prediction performance more accurate than random data selection. Conventional active learning methods, however, yield serious estimation bias, when parametric statistical models do not include the true probability distribution. To correct the bias, we apply the maximum weighted log-likelihood estimator with approximately optimal input distribution. Optimal input distribution for active learning can be obtained by simple regression estimation. Numerical studies show the effectiveness of the proposed learning algorithm.

Full Text