Parameterized MDPs and Reinforcement Learning Problems-A Maximum Entropy Principle-Based Framework.

Amber Srivastava,Srinivasa M Salapaka

doi:10.1109/tcyb.2021.3102510

Amber Srivastava, Srinivasa M Salapaka

Open Access

https://doi.org/10.1109/tcyb.2021.3102510

Copy DOI

Abstract

We present a framework to address a class of sequential decision-making problems. Our framework features learning the optimal control policy with robustness to noisy data, determining the unknown state and action parameters, and performing sensitivity analysis with respect to problem parameters. We consider two broad categories of sequential decision-making problems modeled as infinite horizon Markov decision processes (MDPs) with (and without) an absorbing state. The central idea underlying our framework is to quantify exploration in terms of the Shannon entropy of the trajectories under the MDP and determine the stochastic policy that maximizes it while guaranteeing a low value of the expected cost along a trajectory. This resulting policy enhances the quality of exploration early on in the learning process, and consequently allows faster convergence rates and robust solutions even in the presence of noisy data as demonstrated in our comparisons to popular algorithms, such as Q -learning, Double Q -learning, and entropy regularized Soft Q -learning. The framework extends to the class of parameterized MDP and RL problems, where states and actions are parameter dependent, and the objective is to determine the optimal parameters along with the corresponding optimal policy. Here, the associated cost function can possibly be nonconvex with multiple poor local minima. Simulation results applied to a 5G small cell network problem demonstrate the successful determination of communication routes and the small cell locations. We also obtain sensitivity measures to problem parameters and robustness to noisy environment data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Cybernetics	Publication Date: Sep 1, 2022
Citations: 6	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Parameterized MDPs and Reinforcement Learning Problems-A Maximum Entropy Principle-Based Framework.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Cybernetics

Lead the way for us

Similar Papers

Contraction Mappings in the Theory Underlying Dynamic Programming
Eric V Denardo
SIAM Review | VOL. 9
Eric V DenardoEric V Denardo
01 Apr 1967
SIAM Review | VOL. 9

Structured optimal transmission control in network-coded two-way relay channels
Ni Ding ... Rodney A Kennedy
EURASIP Journal on Wireless Communications and Networking | VOL. 2015
Ni Ding, et. al.Ni Ding ... Rodney A Kennedy
06 Nov 2015
EURASIP Journal on Wireless Communications and Networking | VOL. 2015

In This Issue
-
Operations Research | VOL. 61
--
01 Apr 2013
Operations Research | VOL. 61

Minimizing backlog for downlink of energy harvesting networks
V Venkhat ... Abhay Karandikar
-
V Venkhat, et. al.V Venkhat ... Abhay Karandikar
01 May 2015
01 May 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Parameterized MDPs and Reinforcement Learning Problems-A Maximum Entropy Principle-Based Framework.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Cybernetics