Transition Based Discount Factor for Model Free Algorithms in Reinforcement Learning

Abhinav Sharma,Atul Gupta,K Lakshmanan,Ruchir Gupta

doi:10.3390/sym13071197

Abstract

Reinforcement Learning (RL) enables an agent to learn control policies for achieving its long-term goals. One key parameter of RL algorithms is a discount factor that scales down future cost in the state’s current value estimate. This study introduces and analyses a transition-based discount factor in two model-free reinforcement learning algorithms: Q-learning and SARSA, and shows their convergence using the theory of stochastic approximation for finite state and action spaces. This causes an asymmetric discounting, favouring some transitions over others, which allows (1) faster convergence than constant discount factor variant of these algorithms, which is demonstrated by experiments on the Taxi domain and MountainCar environments; (2) provides better control over the RL agents to learn risk-averse or risk-taking policy, as demonstrated in a Cliff Walking experiment.

Highlights

Reinforcement Learning (RL) algorithms train a software agent to determine a policy, which can solve an RL problem in the most efficient way possible
We have provided a way to set the discount factor, which can be used in tasks where there are finite state and action spaces, and a positive reward is provided at the goal state
After defining everything necessary about Q-learning with the adaptive discount factor, we present the convergence of this algorithm

Summary

Introduction

RL algorithms train a software agent to determine a policy, which can solve an RL problem in the most efficient way possible. Inspired by Wei [22], Yoshida gave the convergence results of the Q-learning algorithm for the state-dependent discount factor [30] These studies directly investigate a discount factor’s role for various cases by using both a test problem approach and a general function approximation case. There is a benefit of using transition-based discount over constant one as the former provides a general unified theory for both episodic and nonepisodic tasks This setting of the discount factor has not been studied yet for Q-learning and SARSA algorithms. The novelty of our work is the introduction of discount factor as a function of transition variables (current state, action and state) to the two model-free reinforcement learning algorithms mentioned above and provide convergence results for the same.

Background

Q-Learning

Transition Dependent Discount Factor

Q-Learning with Transition-Based Discounts

Experiments

Mountain Car

Discussion and Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Symmetry	Publication Date: Jul 2, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Transition Based Discount Factor for Model Free Algorithms in Reinforcement Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry

Lead the way for us

Similar Papers

PMA-DRL: A parallel model-augmented framework for deep reinforcement learning algorithms
Xufang Luo ... Yunhong Wang
Neurocomputing | VOL. 403
Xufang Luo, et. al.Xufang Luo ... Yunhong Wang
25 Apr 2020
Neurocomputing | VOL. 403

Undiscounted reinforcement learning for infinite-time optimal output tracking and disturbance rejection of discrete-time LTI systems with unknown dynamics
Ali Amirparast ... S Kamal Hosseini Sani
International journal of systems science | VOL. ahead-of-print
Ali Amirparast, et. al.Ali Amirparast ... S Kamal Hosseini Sani
08 Jun 2023
International journal of systems science | VOL. ahead-of-print

Duality in Markov Decision Problems with Countable Action and State Spaces
John P Evans
Management Science | VOL. 15
John P EvansJohn P Evans
01 Jul 1969
Management Science | VOL. 15

Module-Based Reinforcement Learning: Experiments with a Real Robot
Zsolt Kalmár ... Csaba Szepesvári
Machine Learning | VOL. 31
Zsolt Kalmár, et. al.Zsolt Kalmár ... Csaba Szepesvári
01 Jan 1998
Machine Learning | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Transition Based Discount Factor for Model Free Algorithms in Reinforcement Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry