Abstract

Offline reinforcement learning (RL) algorithms promise to learn policies directly from offline datasets without environmental interaction. This arrangement enables successful RL applications in the real world, particularly in robots and autonomous driving, where sampling is costly and dangerous. However, the existing offline RL algorithms suffer from insufficient performance attributed to extrapolation error caused by out-of-distribution (OOD) actions. In this work, we propose an offline RL algorithm with an uncertain action constraint (UAC). The design principle of UAC is to minimize the extrapolation error via eliminating unknown and uncertain actions. Concretely, we first theoretically analyze the effects of different types of actions on the extrapolation error. Based on this, we propose an action-constrained strategy that exploits the uncertainty of the environmental dynamics model to eliminate unknown and uncertain actions in the Q-value evaluation process. Furthermore, the convex combination of trajectory information and Gaussian noise is novelly leveraged to enhance the generation probability of the optimal actions. Finally, we carry out the comparison and ablation experiments on the standard D4RL dataset. Experimental results indicate that UAC achieves competitive performance, especially in the field of robotic manipulation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call