Abstract
The hypothesis selection problem (or the k-armed bandit problem) is central to the realization of many learning systems. This paper studies the minimization of sampling cost in hypothesis selection under a probably approximately optimal (PAO) learning framework. Hypothesis selection algorithms could be exploration-oriented or exploitation- oriented. Exploration-oriented algorithms tend to explore unfamiliar alternatives eagerly, while exploitation-oriented algorithms focus their sampling effort to the hypotheses which yield higher utility in the past. Both the exploration and exploitation element of a hypothesis selection algorithm could be useful in reducing sampling cost. We propose a novel family of learning algorithms, the γ-IE family, that explicitly trade off their exploration tendency with exploitation tendency. We establish the sample complexity for the entire γ-IE family. We empirically show that none of the algorithms in this family are cost-optimal for all problems. In addition, our novel parameterization of the family allows users to select the instantiation that best fits his or her application. Our results also imply that the PALO class of speed-up learners can retain their theoretical properties even when a more sophisticated sampling strategy is used.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.