Abstract
Learning with sparse rewards remains a challenging problem in reinforcement learning (RL). In particular, for sequential object manipulation tasks, the RL agent generally only receives a reward upon successful completion of the entire task, leading to low exploration efficiency. To address this sample inefficiency, we propose a novel self-guided continual RL framework, named Relay Hindsight Experience Replay (RHER). RHER decomposes a sequential task into several subtasks with increasing complexity, allowing the agent to learn from the simplest subtask and gradually complete the task. It is crucial that a Self-Guided Exploration Strategy (SGES) is proposed to use the already-learned simpler subtask policy to guide the exploration of a more complex subtask. This strategy allows the agent to break the barriers of sparse reward sequential tasks and achieve efficient learning stage by stage. As a result, the proposed RHER method achieves state-of-the-art performance on the benchmark tasks (FetchPush and FetchPickAndPlace). Furthermore, the experimental results demonstrate the superiority and high efficiency of RHER on a variety of single-object and multi-object manipulation tasks (e.g., ObstaclePush, DrawerBox, TStack, etc.). Finally, the proposed RHER method can also learn a contact-rich task on a real robot from scratch within 250 episodes.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.