Abstract
A large class of decision making under uncertainty problems can be described via Markov decision processes (MDPs) or partially observable MDPs (POMDPs), with application to artificial intelligence and operations research, among others. In this paper, we consider the problem of designing policies for MDPs and POMDPs with objectives and constraints in terms of dynamic coherent risk measures rather than the traditional total expectation, which we refer to as the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">constrained risk-averse problem</i> . Our contributions can be described as follows: For MDPs, under some mild assumptions, we propose an optimization-based method to synthesize Markovian policies. We then demonstrate that such policies can be found by solving difference convex programs (DCPs). We show that our formulation generalize linear programs for constrained MDPs with total discounted expected costs and constraints; For POMDPs, we show that, if the coherent risk measures can be defined as a Markov risk transition mapping, an infinite-dimensional optimization can be used to design Markovian belief-based policies. For POMDPs with stochastic finite-state controllers (FSCs), we show that the latter optimization simplifies to a (finite-dimensional) DCP. We incorporate these DCPs in a policy iteration algorithm to design risk-averse FSCs for POMDPs. We demonstrate the efficacy of the proposed method with numerical experiments involving conditional-value-at-risk (CVaR) and entropic-value-at-risk (EVaR) risk measures.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.