Abstract
Model-Based Reinforcement Learning (MBRL) algorithms have been shown to have an advantage on data-efficiency, but often overshadowed by state-of-the-art model-free methods in performance, especially when facing high-dimensional and complex problems. In this work, a novel MBRL method is proposed, called Risk-Aware Model-Based Control (RAMCO). It combines uncertainty-aware deep dynamics models and the risk assessment technique Conditional Value at Risk (CVaR). This mechanism is appropriate for real-world application since it takes epistemic risk into consideration. In addition, we use a model-free solver to produce warm-up training data, and this setting improves the performance in low-dimensional environments and covers the shortage of MBRL’s nature in the high-dimensional scenarios. In comparison with other state-of-the-art reinforcement learning algorithms, we show that it produces superior results on a walking robot model. We also evaluate the method with an Eidos environment, which is a novel experimental method with multi-dimensional randomly initialized deep neural networks to measure the performance of any reinforcement learning algorithm, and the advantages of RAMCO are highlighted.
Highlights
The controllers of robots are primarily designed and tuned by human engineers through tiresome iterations and require extensive experience and a high degree of expertize (Deisenroth et al, 2013)
We can obtain that for environments InvertedDoublePendulum-v2, HalfCheetah-v2, Hopper-v2, Walker2d-v2, BipedalWalker-v3, LunarLanderContinuous-v2, MinitaurBulletEnv-v0, and AntX, a model-free policy can produce warm-up data that reach a better training for the dynamics model compared to using random samples
We can see that between two model-free policies, Soft Actor-Critic (SAC) showed a remarkable behavior in many warm-up trials, which agrees with the nature of SAC, a method that tries to maximize the entropy of the policy
Summary
The controllers of robots are primarily designed and tuned by human engineers through tiresome iterations and require extensive experience and a high degree of expertize (Deisenroth et al, 2013). The resulting programmed controllers are built upon assuming rigorous models of both the robot’s behavior and its environment. Hard-coded controllers for robots have its limitations when a robot needs to adapt to a new situation or when the robot/environment cannot be precisely modeled. Unlike other machine learning branches, RL is still not widely applied to real-world engineering products, especially in the field of robotics. The main obstacles on the application of RL to such problems are 1) data inefficiency, 2) lack of robustness, and 3) lack of practical advantage over hand-tuned controllers
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.