Our goal is to enable a robot to learn how to sequence its actions to perform high-level tasks specified as natural language instructions, given successful demonstrations from a human partner. Our novel neuro-symbolic solution GOALNET builds an iterative two-step approach that interleaves (i) inferring next subgoal predicate implied by the language instruction, for a given world state, and (ii) synthesizing a feasible subgoal-reaching plan from that state. The agent executes the plan, and the two steps are repeated. GOALNET combines (i) learning, where dense representations are acquired for language instruction and the world state via a neural network prediction model, enabling generalization to novel settings and (ii) planning, where the cause-effect modeling by a classical planner eschews irrelevant predicates, facilitating multi-stage decision making in large domains. GOALNET obtains 78% improvement in the goal reaching rate in comparison to several state-of-the-art approaches on benchmark data with multi-stage instructions. Further, GOALNET can generalize to novel instructions for scenes with unseen objects. Source code available at https://github. com/reail-iitd/goalnet.
Read full abstract