Risk-Aware Model-Based Control.

Chen Yu,Andre Rosendo

doi:10.3389/frobt.2021.617839

Chen Yu, Andre Rosendo

Open Access

https://doi.org/10.3389/frobt.2021.617839

Copy DOI

Journal: Frontiers in Robotics and AI	Publication Date: Mar 11, 2021
Citations: 2	License type: CC BY 4.0

Affiliation: ShanghaiTech University

Abstract

Model-Based Reinforcement Learning (MBRL) algorithms have been shown to have an advantage on data-efficiency, but often overshadowed by state-of-the-art model-free methods in performance, especially when facing high-dimensional and complex problems. In this work, a novel MBRL method is proposed, called Risk-Aware Model-Based Control (RAMCO). It combines uncertainty-aware deep dynamics models and the risk assessment technique Conditional Value at Risk (CVaR). This mechanism is appropriate for real-world application since it takes epistemic risk into consideration. In addition, we use a model-free solver to produce warm-up training data, and this setting improves the performance in low-dimensional environments and covers the shortage of MBRL’s nature in the high-dimensional scenarios. In comparison with other state-of-the-art reinforcement learning algorithms, we show that it produces superior results on a walking robot model. We also evaluate the method with an Eidos environment, which is a novel experimental method with multi-dimensional randomly initialized deep neural networks to measure the performance of any reinforcement learning algorithm, and the advantages of RAMCO are highlighted.

Highlights

The controllers of robots are primarily designed and tuned by human engineers through tiresome iterations and require extensive experience and a high degree of expertize (Deisenroth et al, 2013)
We can obtain that for environments InvertedDoublePendulum-v2, HalfCheetah-v2, Hopper-v2, Walker2d-v2, BipedalWalker-v3, LunarLanderContinuous-v2, MinitaurBulletEnv-v0, and AntX, a model-free policy can produce warm-up data that reach a better training for the dynamics model compared to using random samples
We can see that between two model-free policies, Soft Actor-Critic (SAC) showed a remarkable behavior in many warm-up trials, which agrees with the nature of SAC, a method that tries to maximize the entropy of the policy

Summary

Introduction

The controllers of robots are primarily designed and tuned by human engineers through tiresome iterations and require extensive experience and a high degree of expertize (Deisenroth et al, 2013). The resulting programmed controllers are built upon assuming rigorous models of both the robot’s behavior and its environment. Hard-coded controllers for robots have its limitations when a robot needs to adapt to a new situation or when the robot/environment cannot be precisely modeled. Unlike other machine learning branches, RL is still not widely applied to real-world engineering products, especially in the field of robotics. The main obstacles on the application of RL to such problems are 1) data inefficiency, 2) lack of robustness, and 3) lack of practical advantage over hand-tuned controllers

Objectives

Methods

Results

Conclusion