Relaxed Equilibria for Time-Inconsistent Markov Decision Processes

Erhan Bayraktar,Yu-Jui Huang,Zhenhua Wang,Zhou Zhou

doi:10.1287/moor.2023.0209

Abstract

This paper considers an infinite-horizon Markov decision process (MDP) that allows for general nonexponential discount functions in both discrete and continuous time. Because of the inherent time inconsistency, we look for a randomized equilibrium policy (i.e., relaxed equilibrium) in an intrapersonal game between an agent’s current and future selves. When we modify the MDP by entropy regularization, a relaxed equilibrium is shown to exist by a nontrivial entropy estimate. As the degree of regularization diminishes, the entropy-regularized MDPs approximate the original MDP, which gives the general existence of a relaxed equilibrium in the limit by weak convergence arguments. As opposed to prior studies that consider only deterministic policies, our existence of an equilibrium does not require any convexity (or concavity) of the controlled transition probabilities and reward function. Interestingly, this benefit of considering randomized policies is unique to the time-inconsistent case.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Relaxed Equilibria for Time-Inconsistent Markov Decision Processes

Abstract

Talk to us

Similar Papers

More From: Mathematics of Operations Research

Lead the way for us

Similar Papers

Risk-Averse Reinforcement Learning via Dynamic Time-Consistent Risk Measures
Xian Yu ... Siqian Shen
-
Xian Yu, et. al.Xian Yu ... Siqian Shen
06 Dec 2022
06 Dec 2022

Contraction Mappings in the Theory Underlying Dynamic Programming
Eric V Denardo
SIAM Review | VOL. 9
Eric V DenardoEric V Denardo
01 Apr 1967
SIAM Review | VOL. 9

Distributed energy cooperation for energy harvesting nodes using reinforcement learning
Wei-Ting Lin ... Chia-Han Lee
-
Wei-Ting Lin, et. al.Wei-Ting Lin ... Chia-Han Lee
01 Aug 2015
01 Aug 2015

Age-Dependent Distributed MAC for Ultra-Dense Wireless Networks
Dheeraj Narasimha ... Srinivas Shakkottai
IEEE/ACM Transactions on Networking | VOL. 31
Dheeraj Narasimha, et. al.Dheeraj Narasimha ... Srinivas Shakkottai
01 Aug 2023
IEEE/ACM Transactions on Networking | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Relaxed Equilibria for Time-Inconsistent Markov Decision Processes

Abstract

Talk to us

Similar Papers

More From: Mathematics of Operations Research