Abstract

The learning process in reinforcement learning is time-consuming because on early episodes agent relies too much on exploration. The proposed “coaching” approach focused on helping to accelerate learning for the system with a sparse environmental reward setting. This approach works well with linear epsilon-greedy Q-learning with eligibility traces. To coach an agent, an intermediate target is given by a human coach as a sub-goal for the agent to pursue. This sub-goal provides an additional clue that guides the agent toward the actual terminal state. In the coaching phase, the agent pursues an intermediate target with an aggressive policy. The aggressive reward from this intermediate target would not be used to update the state-action value directly but the environmental reward is used. After a small number of coaching episodes, the learning would proceed normally with an $$\epsilon $$-greedy policy. In this way, the agent will end up with an optimal policy which is not under influence or supervision of a human coach. The proposed method has been tested on three experimental tasks: mountain car, ball following, and obstacle avoidance. Even with the human coach of various skill levels, the experimental results show that this method could speed up the learning process of an agent in all tasks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.