Learning to Control Renewal Processes with Bandit Feedback

Semih Cayci,R Srikant,Atilla Eryilmaz

doi:10.1145/3376930.3376957

Abstract

We consider a bandit problem with K task types from which the controller activates one task at a time. Each task takes a random and possibly heavy-tailed completion time, and a reward is obtained only after the task is completed. The task types are independent from each other, and have distinct and unknown distributions for completion time and reward. For a given time horizon τ , the goal of the controller is to schedule tasks adaptively so as to maximize the reward collected until τ expires. In addition, we allow the controller to interrupt a task and initiate a new one. In addition to the traditional exploration-exploitation dilemma, this interrupt mechanism introduces a new one: should the controller complete the task and get the reward, or interrupt the task for a possibly shorter and more rewarding alternative? We show that for all heavy-tailed and some light-tailed completion time distributions, this interruption mechanism improves the reward linearly over time. From a learning perspective, the interrupt mechanism necessitates implicitly learning statistics beyond the mean from truncated observations. For this purpose, we propose a robust learning algorithm named UCB-BwI based on the median-of-means estimator for possibly heavy-tailed reward and completion time distributions. We show that, in a Karmed bandit setting with an arbitrary set of L possible interrupt times, UCB-BwI achieves O(K log(τ) + KL) regret. We also prove that the regret under any admissible policy is ?(K log(τ)), which implies that UCB-BwI is order optimal.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning to Control Renewal Processes with Bandit Feedback

Abstract

Talk to us

Similar Papers

More From: ACM SIGMETRICS Performance Evaluation Review

Lead the way for us

Similar Papers

Learning to Control Renewal Processes with Bandit Feedback
Semih Cayci ... R Srikant
-
Semih Cayci, et. al.Semih Cayci ... R Srikant
20 Jun 2019
20 Jun 2019

Learning to Control Renewal Processes with Bandit Feedback
Semih Cayci ... Atilla Eryilmaz
Proceedings of the ACM on Measurement and Analysis of Computing Systems | VOL. 3
Semih Cayci, et. al.Semih Cayci ... Atilla Eryilmaz
19 Jun 2019
Proceedings of the ACM on Measurement and Analysis of Computing Systems | VOL. 3

The simplicity of completion time distributions for common complex biochemical processes
Golan Bel ... Brian Munsky
Physical Biology | VOL. 7
Golan Bel, et. al.Golan Bel ... Brian Munsky
21 Dec 2009
Physical Biology | VOL. 7

Reducing Task Completion Time in Mobile Offloading Systems through Online Adaptive Local Restart
Qiushi Wang ... Katinka Wolter
-
Qiushi Wang, et. al.Qiushi Wang ... Katinka Wolter
31 Jan 2015
31 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning to Control Renewal Processes with Bandit Feedback

Abstract

Talk to us

Similar Papers

More From: ACM SIGMETRICS Performance Evaluation Review