Critic-Guided Decision Transformer for Offline Reinforcement Learning

Yuanfu Wang,Ying Wen,Chao Yang,Yu Qiao,Yu Liu

doi:10.1609/aaai.v38i14.29499

Abstract

Recent advancements in offline reinforcement learning (RL) have underscored the capabilities of Return-Conditioned Supervised Learning (RCSL), a paradigm that learns the action distribution based on target returns for each state in a supervised manner. However, prevailing RCSL methods largely focus on deterministic trajectory modeling, disregarding stochastic state transitions and the diversity of future trajectory distributions. A fundamental challenge arises from the inconsistency between the sampled returns within individual trajectories and the expected returns across multiple trajectories. Fortunately, value-based methods offer a solution by leveraging a value function to approximate the expected returns, thereby addressing the inconsistency effectively. Building upon these insights, we propose a novel approach, termed the Critic-Guided Decision Transformer (CGDT), which combines the predictability of long-term returns from value-based methods with the trajectory modeling capability of the Decision Transformer. By incorporating a learned value function, known as the critic, CGDT ensures a direct alignment between the specified target returns and the expected returns of actions. This integration bridges the gap between the deterministic nature of RCSL and the probabilistic characteristics of value-based methods. Empirical evaluations on stochastic environments and D4RL benchmark datasets demonstrate the superiority of CGDT over traditional RCSL methods. These results highlight the potential of CGDT to advance the state of the art in offline RL and extend the applicability of RCSL to a wide range of RL tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Critic-Guided Decision Transformer for Offline Reinforcement Learning

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

Convex Programs and Lyapunov Functions for Reinforcement Learning: A Unified Perspective on the Analysis of Value-Based Methods
Xingang Guo ... Bin Hu
-
Xingang Guo, et. al.Xingang Guo ... Bin Hu
08 Jun 2022
08 Jun 2022

Reinforcement Learning for Clinical Applications.
Kia Khezeli ... Benjamin Shickel
Clinical journal of the American Society of Nephrology : CJASN | VOL. 18
Kia Khezeli, et. al.Kia Khezeli ... Benjamin Shickel
08 Feb 2023
Clinical journal of the American Society of Nephrology : CJASN | VOL. 18

A deep actor critic reinforcement learning framework for learning to rank
Vaibhav Padhye ... Kailasam Lakshmanan
Neurocomputing | VOL. 547
Vaibhav Padhye, et. al.Vaibhav Padhye ... Kailasam Lakshmanan
18 May 2023
Neurocomputing | VOL. 547

Manufacturing Dispatching Using Reinforcement and Transfer Learning
...
-
, et. al. ...
16 Sep 2019
16 Sep 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Critic-Guided Decision Transformer for Offline Reinforcement Learning

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence