E xploration E xploitation Problem in Policy Based Deep Reinforcement Learning for Episodic and Continuous Environments

Vedang Naik,Rohit Sahoo,Saurabh Singh,Shaveta Malik,Sameer Mahajan

doi:10.35940/ijeat.b3267.1211221

Abstract

Reinforcement learning is an artificial intelligence paradigm that enables intelligent agents to accrue environmental incentives to get superior results. It is concerned with sequential decision-making problems which offer limited feedback. Reinforcement learning has roots in cybernetics and research in statistics, psychology, neurology, and computer science. It has piqued the interest of the machine learning and artificial intelligence groups in the last five to ten years. It promises that it allows you to train agents using rewards and penalties without explaining how the task will be completed. The RL issue may be described as an agent that must make decisions in a given environment to maximize a specified concept of cumulative rewards. The learner is not taught which actions to perform but must experiment to determine which acts provide the greatest reward. Thus, the learner has to actively choose between exploring its environment or exploiting it based on its knowledge. The exploration-exploitation paradox is one of the most common issues encountered while dealing with Reinforcement Learning algorithms. Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning. We describe how to utilize several deep reinforcement learning (RL) algorithms for managing a Cartpole system used to represent episodic environments and Stock Market Trading, which is used to describe continuous environments in this study. We explain and demonstrate the effects of different RL ideas such as Deep Q Networks (DQN), Double DQN, and Dueling DQN on learning performance. We also look at the fundamental distinctions between episodic and continuous activities and how the exploration-exploitation issue is addressed in their context.

Full Text