Abstract

Behavioral Cloning (BC) is a simple and effective imitation learning algorithm, which suffers from compounding error due to covariate shift. One solution is to use enough data for training. However, the amount of expert demonstrations available is usually limited. So we propose an effective method to augment expert demonstrations to alleviate the problem of compounding error in BC. It operates by estimating the similarity of states and filtering out transitions that can go back to the states similar to ones in expert demonstrations during the process of sampling. The data filtered out along with original expert demonstrations are used for training. We evaluate the performance of our method on several Atari tasks and continuous MuJoCo control tasks. Empirically, BC trained with the augmented data significantly outperform BC trained with the original expert demonstrations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call