Joint Learning and Control in Stochastic Queueing Networks with Unknown Utilities

Xinzhe Fu,Eytan Modiano

doi:10.1145/3570619

Abstract

We study the optimal control problem in stochastic queueing networks with a set of job dispatchers connected to a set of parallel servers with queues. Jobs arrive at the dispatchers and get routed to the servers following some routing policy. The arrival processes of jobs and the service processes of servers are stochastic with unknown arrival rates and service rates. Upon the completion of each job from dispatcher u n at server s m , a random utility whose mean is unknown is obtained. We seek to design a control policy that makes routing decisions at the dispatchers and scheduling decisions at the servers to maximize the total utility obtained by the end of a finite time horizon T . The performance of policies is measured by regret, which is defined as the difference in total expected utility with respect to the optimal dynamic policy that has access to arrival rates, service rates and underlying utilities. We first show that the expected utility of the optimal dynamic policy is upper bounded by T times the solution to a static linear program, where the optimization variables correspond to rates of jobs from dispatchers to servers and the feasibility region is parameterized by arrival rates and service rates. We next propose a policy for the optimal control problem that is an integration of a learning algorithm and a control policy. The learning algorithm seeks to learn the optimal extreme point solution to the static linear program based on the information available in the optimal control problem. The control policy, a mixture of priority-based and Joint-the-Shortest-Queue routing at the dispatchers and priority-based scheduling at the servers, makes decisions based on the graphical structure induced by the extreme point solutions provided by the learning algorithm. We prove that our policy achieves logarithmic regret whereas application of existing techniques to the optimal control problem would lead to Ω(√ T )-regret. The theoretical analysis is further complemented with simulations to evaluate the empirical performance of our policy.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the ACM on Measurement and Analysis of Computing Systems	Publication Date: Dec 1, 2022
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Joint Learning and Control in Stochastic Queueing Networks with Unknown Utilities

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM on Measurement and Analysis of Computing Systems

Lead the way for us

Similar Papers

Joint Learning and Control in Stochastic Queueing Networks with Unknown Utilities
Xinzhe Fu ... Eytan Modiano
-
Xinzhe Fu, et. al.Xinzhe Fu ... Eytan Modiano
19 Jun 2023
19 Jun 2023

State Transition Tensors for Continuous-Thrust Control of Three-Body Relative Motion
Jackson Kulik ... Dmitry Savransky
Journal of Guidance Control and Dynamics | VOL. 46
Jackson Kulik, et. al.Jackson Kulik ... Dmitry Savransky
09 May 2023
Journal of Guidance Control and Dynamics | VOL. 46

A sequential quadratic Hamiltonian scheme for solving optimal control problems with non-smooth cost functionals

-

01 Jan 2019
01 Jan 2019

Accurate solution of differential-algebraic optimization problems
Jeffery S Logsdon ... Lorenz T Biegler
Industrial & Engineering Chemistry Product Research and Development | VOL. 28
Jeffery S Logsdon, et. al.Jeffery S Logsdon ... Lorenz T Biegler
01 Nov 1989
Industrial & Engineering Chemistry Product Research and Development | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Joint Learning and Control in Stochastic Queueing Networks with Unknown Utilities

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM on Measurement and Analysis of Computing Systems