Abstract

In real-world problems such as robotics, finance and healthcare, randomness is always present, thus, it is important to take risk into consideration in order to limit the chance of rare but dangerous events. The literature on risk-averse reinforcement learning has produced many different approaches to tackle the problem, but they either struggle to scale up to complex instances, or they exhibit irrational behaviors. Here we present two novel risk-averse objectives that are both coherent and easy to optimize: the reward-based mean-mean absolute deviation (Mean-RMAD) and the reward-based conditional value at risk (RCVaR). Instead of reducing the return risk, these measures minimize the per-step reward one. We prove that these risk measures bound the corresponding return-based risk measures, so that they can be also used as proxies for their return-based versions. We develop safe algorithms for these risk measures with guaranteed monotonic improvement, and their practical trust-region versions. Furthermore, we propose a decomposition for the RCVaR optimization problem into a sequence of risk-neutral problems. Finally, we conduct an empirical analysis on the introduced approaches, demonstrating their effectiveness in retrieving a variety of risk-averse behaviors on both toy problems and more challenging ones, such as a simulated trading environment and robotic locomotion tasks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.