Abstract
Traditional scheduling problems in stochastic queueing systems assume that the statistical parameters are known a priori. In ''Learning unknown service rates in queues: A multiarmed bandit approach'', Krishnasamy, Sen, Johari, and Shakkottai consider the problem of online scheduling in a parallel-server system when the statistical parameters are unknown. They study this question in the stochastic multiarmed bandits framework with the queue length as the performance objective. In contrast to the classic stochastic multiarmed bandits problem, where the regret scales logarithmically with time, the authors show that the queue regret (difference in expected queue length between a bandit algorithm and a genie policy) exhibits a more complex behavior. It grows logarithmically in the initial stage and eventually decays almost inversely with time. This remarkable behavior is explained through the analysis of regenerative cycle lengths, which shorten with time as the bandit algorithm learns to stabilize the queues.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.