Abstract
Although imitation learning can learn an optimal policy from expert demonstrations, it may fail to be transferred to practical environments because it is difficult to collect high-quality demonstrations for which the ultimate policy is not accurate enough and converges slowly. To solve the problem, an algorithm that utilizes Non-negative Positive-unlabeled learning (nnPU) as the probabilistic classifier to evaluate the quality of demonstrations, referred to as Non-negative Positive-unlabeled Importance Weighting Imitation Learning (PUIWIL), is proposed to increase the utilization of imperfect demonstrations and improve the performance of imitation learning. PUIWIL introduces confidence scores calculated by the nnPU classifier for expert demonstrations, which indicates the probability that the demonstration is generated by an optimal policy, and reweights all expert demonstrations according to their confidence scores. In addition, PUIWIL reconstructs the standard GAIL framework to make high-quality demonstrations have a more significant impact on imitation learning, which is called Best-in-class Imitation. The experiments demonstrate that PUIWIL improves both the performance and robustness of imitation learning from imperfect demonstrations.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.