Abstract
The inferior sample efficiency of reinforcement learning (RL) and the requirement for high-quality demonstrations in imitation learning (IL) will hinder their application in real-world robots. To address this challenge, a novel self-evolution framework, named task-oriented self-imitation learning (TOSIL), is proposed. To circumvent external demonstrations, the top-K self-generated trajectories are chosen as expert data from both per-episode exploration and long-term return perspectives. Each transition is assigned a guide reward, which is formulated by these trajectories. The guide rewards update as the agent evolves, encouraging good exploration behaviors. This methodology guarantees that the agent explores in the direction relevant to the task, improving sample efficiency and asymptotic performance. The experimental results on locomotion and manipulation tasks indicate that the proposed framework outperforms other state-of-the-art RL methods. Furthermore, the integration of suboptimal trajectories has the potential to improve the sample efficiency while maintaining performance. This is a significant advancement in autonomous skill acquisition for robots.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.