Off Environment Evaluation Using Convex Risk Minimization

Pulkit Katdare,Katherine Driggs Campbell,Shuijing Liu

doi:10.1109/icra46639.2022.9812026

Abstract

Applying reinforcement learning (RL) methods on robots typically involves training a policy in simulation and deploying it on a robot in the real world. Because of the model mismatch between the real world and the simulator, RL agents deployed in this manner tend to perform suboptimally. To tackle this problem, researchers have developed robust policy learning algorithms that rely on synthetic noise disturbances. However, such methods do not guarantee performance in the target environment. We propose a convex risk minimization algorithm to estimate the model mismatch between the simulator and the target domain using trajectory data from both environments. We show that this estimator can be used along with the simulator to evaluate performance of an RL agents in the target domain, effectively bridging the gap between these two environments. We also show that the convergence rate of our estimator to be of the order of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$n^{-1/4}$</tex> , where <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$n$</tex> is the number of training samples. In simulation, we demonstrate how our method effectively approximates and evaluates performance on Gridworld, Cartpole, and Reacher environments on a range of policies. We also show that the our method is able to estimate performance of a 7 DOF robotic arm using the simulator and remotely collected data from the robot in the real world.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Off Environment Evaluation Using Convex Risk Minimization

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Reinforcement Learning for Clinical Applications.
Kia Khezeli ... Benjamin Shickel
Clinical journal of the American Society of Nephrology : CJASN | VOL. 18
Kia Khezeli, et. al.Kia Khezeli ... Benjamin Shickel
08 Feb 2023
Clinical journal of the American Society of Nephrology : CJASN | VOL. 18

Factorial Kernel Dynamic Policy Programming for Vinyl Acetate Monomer Plant Model Control
Yunduan Cui ... Morihiro Fujisaki
-
Yunduan Cui, et. al.Yunduan Cui ... Morihiro Fujisaki
01 Aug 2018
01 Aug 2018

How to Design Reinforcement Learning Methods for the Edge: An Integrated Approach toward Intelligent Decision Making
Guanlin Wu ... Dayu Zhang
Electronics | VOL. 13
Guanlin Wu, et. al.Guanlin Wu ... Dayu Zhang
29 Mar 2024
Electronics | VOL. 13

Improving Algorithm Conflict Resolution Manoeuvres with Reinforcement Learning
Marta Ribeiro ... Jacco Hoekstra
Aerospace | VOL. 9
Marta Ribeiro, et. al.Marta Ribeiro ... Jacco Hoekstra
19 Dec 2022
Aerospace | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Off Environment Evaluation Using Convex Risk Minimization

Abstract

Talk to us

Similar Papers