Learning to Control Renewal Processes with Bandit Feedback

Semih Cayci,Atilla Eryilmaz,R Srikant

doi:10.1145/3309697.3331515

Abstract

We consider a bandit problem with K task types from which the controller activates one task at a time. Each task takes a random and possibly heavy-tailed completion time, and a reward is obtained only after the task is completed. The task types are independent from each other, and have distinct and unknown distributions for completion time and reward. For a given time horizon τ, the goal of the controller is to schedule tasks adaptively so as to maximize the reward collected until τ expires. In addition, we allow the controller to interrupt a task and initiate a new one. In addition to the traditional exploration-exploitation dilemma, this interrupt mechanism introduces a new one: should the controller complete the task and get the reward, or interrupt the task for a possibly shorter and more rewarding alternative? We show that for all heavy-tailed and some light-tailed completion time distributions, this interruption mechanism improves the reward linearly over time. Applications of this model include server scheduling, optimal free sampling strategies in advertising and adaptive content selection. From a learning perspective, the interrupt mechanism necessitates learning the whole arm distribution from truncated observations. For this purpose, we propose a robust learning algorithm named UCB-BwI based on median-of-means estimator for possibly heavy-tailed reward and completion time distributions. We show that, in a K-armed bandit setting with an arbitrary set of L possible interrupt times, UCB-BwI achieves O(Klog(τ)+KL) regret. We also prove that the regret under any admissible policy is Omega(Klog(τ)), which implies that UCB-BwI is order optimal.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning to Control Renewal Processes with Bandit Feedback

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Learning to Control Renewal Processes with Bandit Feedback
Semih Cayci ... Atilla Eryilmaz
Proceedings of the ACM on Measurement and Analysis of Computing Systems | VOL. 3
Semih Cayci, et. al.Semih Cayci ... Atilla Eryilmaz
19 Jun 2019
Proceedings of the ACM on Measurement and Analysis of Computing Systems | VOL. 3

Learning to Control Renewal Processes with Bandit Feedback
Semih Cayci ... Atilla Eryilmaz
ACM SIGMETRICS Performance Evaluation Review | VOL. 47
Semih Cayci, et. al.Semih Cayci ... Atilla Eryilmaz
17 Dec 2019
ACM SIGMETRICS Performance Evaluation Review | VOL. 47

The simplicity of completion time distributions for common complex biochemical processes
Golan Bel ... Brian Munsky
Physical Biology | VOL. 7
Golan Bel, et. al.Golan Bel ... Brian Munsky
21 Dec 2009
Physical Biology | VOL. 7

Reducing Task Completion Time in Mobile Offloading Systems through Online Adaptive Local Restart
Qiushi Wang ... Katinka Wolter
-
Qiushi Wang, et. al.Qiushi Wang ... Katinka Wolter
31 Jan 2015
31 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning to Control Renewal Processes with Bandit Feedback

Abstract

Talk to us

Similar Papers