Using Deep Q-Networks to Train an Agent to Navigate the Unity ML-Agents Banana Environment

Oluwaseyi Awoga

doi:10.2139/ssrn.3881878

Abstract

Deep Q-learning is the combination of the Q-learning process with a function approximation technique such as a neural network. According to (Zai & Brown 2020), the main idea behind Q-learning is the use of an algorithm to predict a state-action pair, and to then compare the results generated from this prediction to the observed accumulated rewards at some later time. The parameters of the algorithms are then updated so that it makes better predictions next time. While this technique has some advantages that make it very useful for solving reinforcement learning problems, it also falls short for solving complex problems with large state-space. In fact, (Google DeepMind 2015) supported the above conclusion in its seminal paper entitled “Human-level control through deep reinforcement learning”. In this paper, (Mnih et al 2015) asserted that “to use reinforcement learning successfully in situations approaching real-world complexity, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experiences to new situations”. To achieve this objective they stated further, “we developed a novel agent, a deep Q-network (DQN), which is able to combine reinforcement learning with a class of artificial neural network known as deep neural networks”. While Q-learning as a tool for solving reinforcement learning problems has enjoyed some remarkable successes in the past, it was not until the introduction of DQN that practitioners were able to use it to solve large-scale problems. Prior to that, reinforcement learning was limited to “applications and domains in which useful features could be handcrafted, or to domains with fully observed, low-dimensional state spaces”, (Mnih et al 2015) argued further.

Full Text