An Exploration Strategy for RL with Considerations of Budget and Risk

Jonathan Serrano Cuevas,Eduardo Morales Manzanares

doi:10.1007/978-3-319-59226-8_11

Abstract

Reinforcement Learning (RL) algorithms create a mapping from states to actions, in order to maximize an expected reward and derive an optimal policy. However, traditional learning algorithms rarely consider that learning has an associated cost and that the available resources to learn may be limited. Therefore, we can think of learning over a limited budget. If we are developing a learning algorithm for an agent i.e. a robot, we should consider that it may have a limited amount of battery; if we do the same for a finance broker, it will have a limited amount of money. Both examples require planning according to a limited budget. Another important concept, related to budget-aware reinforcement learning, is called risk profile, and it relates to how risk-averse the agent is. The risk profile can be used as an input to the learning algorithm so that different policies can be learned according to how much risk the agent is willing to expose itself to. This paper describes a new strategy to incorporate the agent’s risk profile as an input to the learning framework by using reward shaping. The paper also studies the effect of a constrained budget on RL and shows that, under such restrictions, RL algorithms can be forced to make a more efficient use of the available resources. The experiments show that as the even if it is possible to learn on a constrained budget with low budgets the learning process becomes slow. They also show that the reward shaping process is able to guide the agent to learn a less risky policy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Exploration Strategy for RL with Considerations of Budget and Risk

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Mobile robot path planning using deep deterministic policy gradient with differential gaming (DDPG-DG) exploration
Shripad V Deshpande ... Mahesh Datta Sai Ponnuru
Cognitive Robotics | VOL. 4
Shripad V Deshpande, et. al.Shripad V Deshpande ... Mahesh Datta Sai Ponnuru
01 Jan 2024
Cognitive Robotics | VOL. 4

Dynamic Economic Optimization of a Continuously Stirred Tank Reactor Using Reinforcement Learning
Derek Machalek ... Titus Quah
-
Derek Machalek, et. al.Derek Machalek ... Titus Quah
01 Jul 2020
01 Jul 2020

Solving the binary knapsack problem using tabular and deep reinforcement learning algorithms
Samuel Levente Benford
-
Samuel Levente BenfordSamuel Levente Benford
01 Jan 2020
01 Jan 2020

Biped dynamic walking using reinforcement learning
Hamid Benbrahim ... Judy A Franklin
Robotics and Autonomous Systems | VOL. 22
Hamid Benbrahim, et. al.Hamid Benbrahim ... Judy A Franklin
01 Dec 1997
Robotics and Autonomous Systems | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Exploration Strategy for RL with Considerations of Budget and Risk

Abstract

Talk to us

Similar Papers