Abstract

Deep reinforcement learning (DRL) has achieved remarkable milestones in the field of artificial intelligence. However, the reward functions for most real-world tasks are characterized by delays and sparsity, posing significant challenges for DRL methods. To tackle the issues of delayed and sparse rewards, there have been many approaches based on the prior knowledge of expert trajectories proposed, such as GAIL and its variants. However, if only suboptimal demonstrations available, they usually struggle to overcome the performance disadvantage due to the complexity and fragility of adversarial training. To address these problems, this paper introduces a novel framework combining Self-Imitation learning with Reward Relabeling based Reinforcement learning, thus dubbed SIR3. It is capable of accelerating online learning using suboptimal demonstrations in environments even with extremely sparse rewards and meanwhile encouraging exploration of better policies. SIR3 devises a task-independent reward relabeling mechanism to generate reward signals for both the expert examples and online experience. This design provides the agent with more informative guidance, even when the number of suboptimal demonstrations is minimal. During the training process, the integration of imitation learning and RL losses enables the agent to dynamically mimic rewarding trajectories, possibly collected from experts or self-explored. Experimental findings on widely recognized MuJoCo benchmarks reveal that SIR3 can efficiently learn excellent policies surpassing suboptimal demonstrations, achieving superior training efficiency and performance relative to SOTA methods. Notably, in some environments, it secures a performance edge over an order of magnitude.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.