Abstract

Interpretable learning agents directly construct models that provide insight into the relationships learnt. Moreover, to date, there has been a lot of emphasis on interpreting reactive models developed for supervised learning tasks. In this work, we consider the case of models developed to address a suite of 6 partially observable tasks defined in the Dota 2 Online Battle Arena game engine. This means that learning agents need to make decisions based on the previous state as developed by the learning agent's memory; in addition to a 310-dimensional state vector provided by the game engine. Interpretability is addressed by adopting the tangled program graph approach to developing learning agents. Thus, decision-making is explicitly divide-and-conquer, with different parts of the resulting graph visited depending on the task context. We demonstrate that programs comprising the tangled program graph approach self-organize such that: (1) small subsets of task features are identified to define conditions under which index memory is written, and; (2) the subset of programs responsible for defining actions typically query indexed memory rather than task features. Particular preferences emerge for different tasks; thus, the blocking (or evasion) tasks result in a preference for specific actions whereas more open-ended tasks assume policies based on combinations of behaviours. In short, the ability to evolve the topology of the learning agent provides insights into how the policies are being constructed for addressing partially observable tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call