Abstract
We study the problem of Conditional Value-at-Risk (CVaR) optimization for a finite-state Markov Decision Process (MDP) with total discounted costs and the reduction of this problem to a stochastic game with perfect information. The CVaR optimization problem for finite and infinite-horizon MDPs can be reformulated as a zero-sum stochastic game with a compact state space. This game has the following property: while the second player has perfect information including the knowledge of the decision chosen by the first player at the current time instance, the first player does not directly observe the augmented component of the state and does not know current and past decisions chosen by the second player. Using methods of convex analysis, we show optimal policies exist for this game and an optimal policy of the first player optimizes CVaR of the total discounted costs. In addition to proving existence of optimal policies, we provide algorithms for their computation.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have