Abstract

We study the estimation of risk-sensitive policies in reinforcement learning (RL) problems defined by a Markov Decision Process (MDPs) whose state and action spaces are countably finite. Prior efforts are predominately afflicted by computational challenges associated with the fact that risk-sensitive MDPs are time-inconsistent, and hence modifications of Bellman's equations are required, which often do not permit efficient fixed point schemes. To ameliorate this issue, we propose a new definition of risk, which we call caution, as a penalty function added to the dual of the linear programming (LP) formulation of tabular RL. Caution quantifies the distributional risk of a policy, and is a function of the policy's long-term state occupancy measure. When the risk is a convex function of the occupancy measure, we propose to solve this problem in an online model-free manner with a stochastic variant of primal-dual method that uses Kullback-Lieber (KL) divergence as its proximal term. We establish that the sample complexity required to attain near-optimal solutions matches tight dependencies on the cardinality of the spaces, but also depends on the infinity norm of the risk measure's gradient. Moreover, when the risk is non-convex, we propose a block-coordinate augmentation of the aforementioned approach, and establish its convergence to KKT points. Experiments demonstrate that this approach improves the reliability of reward accumulation without additional computation as compared to risk-neutral LP solvers, both for convex and non-convex risks, respectively, for the cases of KL divergence to a prior, and the variance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.