Deep Reinforcement Learning

Abhilash Majumder

doi:10.1007/978-1-4842-6503-1_5

Abstract

In the last chapter, we studied the various aspects of the brain-academy architecture of the ML Agents Toolkit and understood certain scripts that are very important for the agent to make a decision according to a policy. In this chapter, we will be looking into the core concepts of deep reinforcement learning (RL) through Python and its interaction with the C# scripts of the brain-academy architecture. We have had a glimpse of a part of deep RL when we briefly discussed the deep Q-learning algorithm using the OpenAI Gym environment (CartPole) and also when we were discussing the Baselines library of OpenAI. Through the course of training the ML Agents in Tensorflow through external brain, we have also used the proximal policy optimization (PPO) algorithm with the default hyperparameters present in the trainer_config.yaml file. We will be discussing these algorithms in depth along with several other algorithms from the actor critic paradigm. However, to fully understand this chapter, we have to understand how to build deep learning networks using Tensorflow and the Keras module. We also have to understand the basic concepts of deep learning and why it is required in the current context. Through this chapter we will also create neural network models for computer vision methods, which will be extremely important when we will be studying the GridWorld environment. Since we primarily have ray and camera sensors that provide the observation space to the agent, in most of the models, we will have two variants of policies: multi-layered perceptron (MLP-based networks) and convolution neural networks (CNN-2D-based networks). We will also be looking into other simulations and games that are created using the ML Agents Toolkit and will also try to train our models based on the Baseline implementations by OpenAI. However, let us first understand the fundamentals of generic neural network models in deep learning.

Full Text