Learning to Schedule Network Resources Throughput and Delay Optimally Using Q+-Learning

Jeongmin Bae,Song Chong,Joohyun Lee

doi:10.1109/tnet.2021.3051663

Abstract

As network architecture becomes complex and the user requirement gets diverse, the role of efficient network resource management becomes more important. However, existing throughput-optimal scheduling algorithms such as the max-weight algorithm suffer from poor delay performance. In this paper, we present reinforcement learning-based network scheduling algorithms for a single-hop downlink scenario which achieve throughput-optimality and converge to minimal delay. To this end, we first formulate the network optimization problem as a Markov decision process (MDP) problem. Then, we introduce a new state-action value function called Q + -function and develop a reinforcement learning algorithm called Q + -learning with UCB (Upper Confidence Bound) exploration which guarantees small performance loss during a learning process. We also derive an upper bound of the sample complexity in our algorithm, which is more efficient than the best known bound from Q-learning with UCB exploration by a factor of γ 2 where γ is the discount factor of the MDP problem. Finally, via simulation, we verify that our algorithm shows a delay reduction of up to 40.8% compared to the max-weight algorithm over various scenarios. We also show that the Q + -learning with UCB exploration converges to an ε-optimal policy 10 times faster than Q-learning with UCB.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning to Schedule Network Resources Throughput and Delay Optimally Using Q+-Learning

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM transactions on networking : a joint publication of the IEEE Communications Society, the IEEE Computer Society, and the ACM with its Special Interest Group on Data Communication

Lead the way for us

Journal: IEEE/ACM transactions on networking : a joint publication of the IEEE Communications Society, the IEEE Computer Society, and the ACM with its Special Interest Group on Data Communication	Publication Date: Jan 28, 2021
Citations: 43

Similar Papers

Conversion of MDP problems into heuristics based planning problems using temporal decomposition
Rida Gillani ... Ali Nasir
-
Rida Gillani, et. al.Rida Gillani ... Ali Nasir
01 Jan 2015
01 Jan 2015

Age of Aggregated Information: Timely Status Update with Over-The-Air Computation
Jie Li ... He Chen
-
Jie Li, et. al.Jie Li ... He Chen
01 Dec 2020
01 Dec 2020

A Markov decision process approach to vacant taxi routing with e-hailing
Xinlian Yu ... Hyoshin Park
Transportation Research Part B-methodological | VOL. 121
Xinlian Yu, et. al.Xinlian Yu ... Hyoshin Park
15 Jan 2019
Transportation Research Part B-methodological | VOL. 121

Demand Response Management for Profit Maximizing Energy Loads in Real-Time Electricity Market
Shuoyao Wang ... Suzhi Bi
IEEE transactions on power systems : a publication of the Power Engineering Society | VOL. 33
Shuoyao Wang, et. al.Shuoyao Wang ... Suzhi Bi
01 Nov 2018
IEEE transactions on power systems : a publication of the Power Engineering Society | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning to Schedule Network Resources Throughput and Delay Optimally Using Q+-Learning

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM transactions on networking : a joint publication of the IEEE Communications Society, the IEEE Computer Society, and the ACM with its Special Interest Group on Data Communication