Abstract

The environment with sparse rewards in reinforcement learning is a common problem and the agent learns inefficiently using general methods. A new solution called trialand-error experience replay is proposed. In this method, the general hindsight experience replay is combined with a curiositydriven model, by which the sample-efficiency will be improved although extrinsic rewards are sparse. It is demonstrated as an algorithm to control a virtual robotic arm to reach a mobile goal. Through analysis the robotic arm can explore and learn based on failure trajectories which shows that the agent mimics a human who failed repeatedly but still tries to learn something from the unexpected outcomes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call