Abstract

Reinforcement learning (RL) still suffers from the problem of sample inefficiency and struggles with the exploration issue, particularly in situations with long-delayed rewards, sparse rewards, and deep local optimum. Recently, learning from demonstration (LfD) paradigm was proposed to tackle this problem. However, these methods usually require a large number of demonstrations. In this study, we present a sample efficient teacher-advice mechanism with Gaussian process (TAG) by leveraging a few expert demonstrations. In TAG, a teacher model is built to provide both an advice action and its associated confidence value. Then, a guided policy is formulated to guide the agent in the exploration phase via the defined criteria. Through the TAG mechanism, the agent is capable of exploring the environment more intentionally. Moreover, with the confidence value, the guided policy can guide the agent precisely. Also, due to the strong generalization ability of Gaussian process, the teacher model can utilize the demonstrations more effectively. Therefore, substantial improvement in performance and sample efficiency can be attained. Considerable experiments on sparse reward environments demonstrate that the TAG mechanism can help typical RL algorithms achieve significant performance gains. In addition, the TAG mechanism with soft actor-critic algorithm (TAG-SAC) attains the state-of-the-art performance over other LfD counterparts on several delayed reward and complicated continuous control environments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.