Active Learning for Individual Data via Minimal Stochastic Complexity

Shachar Shayovitz,Meir Feder

doi:10.1109/allerton49937.2022.9929357

Abstract

Modern machine learning systems require massive amounts of labeled training data in order to achieve high accuracy rates. Active learning uses feedback to label the most informative data points and significantly reduces the labeling effort. Many heuristics for selecting data points have been developed in recent years which are usually tailored to a specific task and a general unified framework is lacking. In this work, the individual data setting is considered and an active learning criterion is proposed. In this setting the features and labels, both in the training and the test, are specific individual, deterministic quantities. Motivated by connections between source coding and minimax learning, the proposed criterion attempts to find data points which minimize the average Predictive Normalized Maximum Likelihood (pNML) on the unlabeled test set. It is shown using a real data set that the proposed criterion performs better than other active learning criteria.

Full Text