Conservative network for offline reinforcement learning

Zhiyong Peng,Yadong Liu,Haoqiang Chen,Zongtan Zhou

doi:10.1016/j.knosys.2023.111101

Abstract

Offline reinforcement learning (RL) aims to learn policies from static datasets. The value overestimation of out-of-distribution (OOD) actions makes it difficult to directly apply general RL methods in the offline setting. To overcome this problem, many works focus on estimating the value function conservatively or pessimistically. However, existing methods require additional OOD sampling or uncertainty estimation to underestimate OOD values, making them complex and vulnerable to hyperparameters. Is it possible to design a specific value function that can automatically be conservative on OOD samples? In this study, we reveal the anti-conservation property of the widely used ReLU network under certain conditions and explain the reason theoretically. Based on the analysis of the ReLU network, we propose a novel neural network architecture that pushes down the value of those samples far away from the datasets; we call this kind of new architecture the Conservative Network (ConsNet). Based on ConsNet, a new offline RL algorithm with simple implementation and high performance is proposed. Since we can obtain additional conservation from the ConsNet itself, by integrating the ConsNet into several existing offline RL methods, we find that it can significantly improve the performance or reduce the original algorithm complexity. With its simplicity and superiority, we hope that ConsNet could be a new fundamental network architecture for offline RL.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Conservative network for offline reinforcement learning

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems

Lead the way for us

Journal: Knowledge-Based Systems	Publication Date: Oct 20, 2023
Citations: 1

Similar Papers

Reinforcement Learning for Clinical Applications.
Kia Khezeli ... Benjamin Shickel
Clinical journal of the American Society of Nephrology : CJASN | VOL. 18
Kia Khezeli, et. al.Kia Khezeli ... Benjamin Shickel
08 Feb 2023
Clinical journal of the American Society of Nephrology : CJASN | VOL. 18

Value function uncertainty as a cognitive map for reinforcement learning
Daw Nathaniel
Frontiers in Systems Neuroscience | VOL. 3
Daw NathanielDaw Nathaniel
01 Jan 2009
Frontiers in Systems Neuroscience | VOL. 3

Reinforcement learning and simulation-based search in computer go

-

01 Jan 2009
01 Jan 2009

Approximate Dynamic Programming
Rémi Munos
-
Rémi MunosRémi Munos
28 Feb 2013
28 Feb 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Conservative network for offline reinforcement learning

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems