A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

Mehrdad Moharrami,Arghyadip Roy,R Srikant,Yashaswini Murthy

doi:10.1287/moor.2022.0139

Abstract

We study the risk-sensitive exponential cost Markov decision process (MDP) formulation and develop a trajectory-based gradient algorithm to find the stationary point of the cost associated with a set of parameterized policies. We derive a formula that can be used to compute the policy gradient from (state, action, cost) information collected from sample paths of the MDP for each fixed parameterized policy. Unlike the traditional average cost problem, standard stochastic approximation theory cannot be used to exploit this formula. To address the issue, we introduce a truncated and smooth version of the risk-sensitive cost and show that this new cost criterion can be used to approximate the risk-sensitive cost and its gradient uniformly under some mild assumptions. We then develop a trajectory-based gradient algorithm to minimize the smooth truncated estimation of the risk-sensitive cost and derive conditions under which a sequence of truncations can be used to solve the original, untruncated cost problem. Funding: This work was supported by the Office of Naval Research Global [Grant N0001419-1-2566], the Division of Computer and Network Systems [Grant 21-06801], the Army Research Office [Grant W911NF-19-1-0379], and the Division of Computing and Communication Foundations [Grants 17-04970 and 19-34986].

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

Abstract

Talk to us

Similar Papers

More From: Mathematics of Operations Research

Lead the way for us

Similar Papers

MDP Formulation for Multi-UAVs Mission Planning with Refueling Constraints
Seung-Keol Ryu ... Byeong-Min Jeong
-
Seung-Keol Ryu, et. al.Seung-Keol Ryu ... Byeong-Min Jeong
01 Jan 2023
01 Jan 2023

SOBA: Session optimal MDP-based network friendly recommendations
Theodoros Giannakas ... Anastasios Giovanidis
-
Theodoros Giannakas, et. al.Theodoros Giannakas ... Anastasios Giovanidis
10 May 2021
10 May 2021

On systems of UAVs for persistent security presence: A generic network representation, MDP formulation and heuristics for task allocation
Minjun Kim ... James R Morrison
-
Minjun Kim, et. al.Minjun Kim ... James R Morrison
01 Jun 2019
01 Jun 2019

A Markov Decision Model for Adaptive Scheduling of Stored Scalable Videos
Chao Chen ... Alan C Bovik
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 23
Chao Chen, et. al.Chao Chen ... Alan C Bovik
01 Jun 2013
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

Abstract

Talk to us

Similar Papers

More From: Mathematics of Operations Research