Abstract
Developing self learning model for a game is challenging as the environment keeps changing all the time and therefore require highly intelligent models which can make decisions depending on the environment in real time. The agent has to learn the environment and takes action based on the inference. Based on the action, a positive or negative reward is given to the agent. The agent again learns from the reward and enhances / trains itself to behave better in the environment. This work aims to train an agent using deep reinforcement learning algorithms to play a multiplayer online game like SLITHER.IO. We use an OpenAI Universe environment to collect raw image inputs from sample gaming as training data. Agent learns the current state of the environment and the position of the other players (snakes). Then it takes action in the form of direction of its movement. To compare our model to other existing systems and random policy, we propose to use deep Q-learning and other actor critic approaches such as Proximal Policy Optimisation (PPO) with reward shaping and replay buffer. Out of all these algorithms the PPO agent shows significant improvement in the score over a range of episodes. PPO agent learns quickly and its reward progression is higher when compared to other techniques.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have