Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes

Nihal Sharma,Rajat Sen,Karthikeyan Shanmugam,Soumya Basu,Sanjay Shakkottai

doi:10.1145/3680279

Abstract

We study a variant of the contextual bandit problem where an agent can intervene through a set of stochastic expert policies. Given a fixed context, each expert samples actions from a fixed conditional distribution. The agent seeks to remain competitive with the “best” among the given set of experts. We propose the Divergence-based Upper Confidence Bound (D-UCB) algorithm that uses importance sampling to share information across experts and provide horizon-independent constant regret bounds that only scale linearly in the number of experts. We also provide the Empirical D-UCB (ED-UCB) algorithm that can function with only approximate knowledge of expert distributions. Further, we investigate the episodic setting where the agent interacts with an environment that changes over episodes. Each episode can have different context and reward distributions resulting in the best expert changing across episodes. We show that by bootstrapping from \(\mathcal {O}(N\log (NT^2\sqrt {E}))\) samples, ED-UCB guarantees a regret that scales as \(\mathcal {O}(E(N+1) + \frac{N\sqrt {E}}{T^2})\) for N experts over E episodes, each of length T . We finally empirically validate our findings through simulations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Modeling and Performance Evaluation of Computing Systems

Lead the way for us

Journal: ACM Transactions on Modeling and Performance Evaluation of Computing Systems	Publication Date: Aug 13, 2024
License type: other-oa

Similar Papers

On the regret of online edge service hosting
R Sri Prakash ... Sharayu Moharir
Performance Evaluation | VOL. 162
R Sri Prakash, et. al.R Sri Prakash ... Sharayu Moharir
11 Sep 2023
Performance Evaluation | VOL. 162

On the Regret of Online Edge Service Hosting
R Sri Prakash ... Nikhil Karamchandani
ACM SIGMETRICS Performance Evaluation Review | VOL. 50
R Sri Prakash, et. al.R Sri Prakash ... Nikhil Karamchandani
26 Apr 2023
ACM SIGMETRICS Performance Evaluation Review | VOL. 50

On the Regret of Online Edge Service Hosting
R Sri Prakash ... Sharayu Moharir
-
R Sri Prakash, et. al.R Sri Prakash ... Sharayu Moharir
19 Sep 2022
19 Sep 2022

Constant Regret Resolving Heuristics for Price-Based Revenue Management
Yining Wang ... He Wang
Operations Research | VOL. 70
Yining Wang, et. al.Yining Wang ... He Wang
31 Jan 2022
Operations Research | VOL. 70

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Modeling and Performance Evaluation of Computing Systems