Scaling Tangled Program Graphs to Visual Reinforcement Learning in ViZDoom

Robert J Smith,Malcolm I Heywood

doi:10.1007/978-3-319-77553-1_9

Abstract

A tangled program graph framework (TPG) was recently proposed as an emergent process for decomposing tasks and simultaneously composing solutions by organizing code into graphs of teams of programs. The initial evaluation assessed the ability of TPG to discover agents capable of playing Atari game titles under the Arcade Learning Environment. This is an example of ‘visual’ reinforcement learning, i.e. agents are evolved directly from the frame buffer without recourse to hand designed features. TPG was able to evolve solutions competitive with state-of-the-art deep reinforcement learning solutions, but at a fraction of the complexity. One simplification assumed was that the visual input could be down sampled from a \(210 \times 160\) resolution to \(42 \times 32\). In this work, we consider the challenging 3D first person shooter environment of ViZDoom and require that agents be evolved at the original visual resolution of \(320 \times 240\) pixels. In addition, we address issues for developing agents capable of operating in multi-task ViZDoom environments simultaneously. The resulting TPG solutions retain all the emergent properties of the original work as well as the computational efficiency. Moreover, solutions appear to generalize across multiple task scenarios, whereas equivalent solutions from deep reinforcement learning have focused on single task scenarios alone.

Full Text