Reinforcement learning with algorithms from probabilistic structure estimation

Jonathan P Epperlein,Roman Overko,Sergiy Zhuk,Christopher King,Djallel Bouneffouf,Andrew Cullen,Robert Shorten

doi:10.1016/j.automatica.2022.110483

Abstract

Reinforcement learning (RL) algorithms aim to learn optimal decisions in unknown environments through the experience of taking actions and observing the rewards gained. In some cases, the environment is not influenced by the actions of the RL agent, in which case the problem can be modeled as a contextual multi-armed bandit, and lightweight myopic algorithms can be employed. On the other hand, when the RL agent’s actions affect the environment, the problem must be modeled as a Markov decision process, and more complex RL algorithms are required, which take the future effects of actions into account. Moreover, in practice, it is often unknown from the outset whether or not the agent’s actions will impact the environment, and it is therefore not possible to determine which RL algorithm is most fitting. In this work, we propose to avoid this difficult decision entirely and incorporate a choice mechanism into our RL framework. Rather than assuming a specific problem structure, we use a probabilistic structure estimation procedure based on a likelihood-ratio (LR) test to make a more informed selection of the learning algorithm. We derive a sufficient condition under which myopic policies are optimal, present an LR test for this condition, and derive a bound on the regret of our framework. We provide examples of real-world scenarios where our framework is needed and provide extensive simulations to validate our approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Automatica	Publication Date: Aug 6, 2022
Citations: 4	License type: cc-by

R Discovery Prime

R Discovery Prime

Reinforcement learning with algorithms from probabilistic structure estimation

Abstract

Talk to us

Similar Papers

More From: Automatica

Lead the way for us

Similar Papers

Security Assessment of the Contextual Multi-Armed Bandit - RL Algorithm for Link Adaptation
Mariam El-Sobky ... Hisham Sarhan
-
Mariam El-Sobky, et. al.Mariam El-Sobky ... Hisham Sarhan
24 Oct 2020
24 Oct 2020

Combining manual feedback with subsequent MDP reward signals for reinforcement learning
...
-
, et. al. ...
10 May 2010
10 May 2010

Complex-Valued Reinforcement Learning: a Context-Based Approach for POMDPs
Takeshi Shibuya ... Tomoki Hamagami
-
Takeshi Shibuya, et. al.Takeshi Shibuya ... Tomoki Hamagami
14 Jan 2011
14 Jan 2011

OpenGraphGym: A Parallel Reinforcement Learning Framework for Graph Optimization Problems
Weijian Zheng ... Fengguang Song
-
Weijian Zheng, et. al.Weijian Zheng ... Fengguang Song
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reinforcement learning with algorithms from probabilistic structure estimation

Abstract

Talk to us

Similar Papers

More From: Automatica