Abstract

Learning an imitating policy offline from the expert’s demonstrations is prone to be a significant yet challenging problem. Despite great success, most methods assume that the data are uncorrupted with no latent confounders. However, such unobserved confounders could appear in many real-world applications, resulting in sub-optimal policies. Thus, in this paper, we propose an integrated two-stage algorithm to conduct the task of offline causal imitation learning, allowing the existence of latent confouders. In Stage 1, we aim at determining whether these latent variables are present or not, embracing a causal discovery method based on the conditional independence tests. In Stage 2, we adopt behavioral cloning or a variant of instrumental variable regression method for both the confounded and unconfounded cases, to eliminate the possible confounding influences. Experiments on the robotic arm control task verified the efficacy performances in both confounded and unconfounded situations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call