Abstract

The hypothesis selection problem (or the k-armed bandit problem) is central to the realization of many learning systems. This paper studies the minimization of sampling cost in hypothesis selection under a probably approximately optimal (PAO) learning framework. Hypothesis selection algorithms could be exploration-oriented or exploitation- oriented. Exploration-oriented algorithms tend to explore unfamiliar alternatives eagerly, while exploitation-oriented algorithms focus their sampling effort to the hypotheses which yield higher utility in the past. Both the exploration and exploitation element of a hypothesis selection algorithm could be useful in reducing sampling cost. We propose a novel family of learning algorithms, the γ-IE family, that explicitly trade off their exploration tendency with exploitation tendency. We establish the sample complexity for the entire γ-IE family. We empirically show that none of the algorithms in this family are cost-optimal for all problems. In addition, our novel parameterization of the family allows users to select the instantiation that best fits his or her application. Our results also imply that the PALO class of speed-up learners can retain their theoretical properties even when a more sophisticated sampling strategy is used.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call