This work investigates how natural language task descriptions can accelerate reinforcement learning in games. Recognizing that human descriptions often imply a hierarchical task structure, we propose a method to extract this hierarchy and convert it into "options" – policies for solving subtasks. These options are generated by grounding natural language descriptions into environment states, which are then used as task boundaries to learn option policies either by leveraging prior successful traces or from human created walkthroughs. We evaluate our approach in both a simpler grid-world environment and the more complex text-based game Zork, comparing option-based agents against standard Q-learning and random agents. Our results demonstrate the effectiveness of incorporating natural language task knowledge for faster and more efficient reinforcement learning across different environments and Q-learning algorithms, including tabular Q-learning and Deep Q-Networks.
Read full abstract