Entropy regularized reinforcement learning using large deviation theory

Argenis Arriojas,Rahul V Kulkarni,Stas Tiomkin,Jacob Adamczyk

doi:10.1103/physrevresearch.5.023085

Argenis Arriojas, Rahul V Kulkarni + Show 2 more

https://doi.org/10.1103/physrevresearch.5.023085

Copy DOI

Abstract

Reinforcement learning (RL) is an important field of research in machine learning that is increasingly being applied to complex optimization problems in physics. In parallel, concepts from physics have contributed to important advances in RL with developments such as entropy-regularized RL. While these developments have led to advances in both fields, obtaining analytical solutions for optimization in entropy-regularized RL is currently an open problem. In this paper, we establish a mapping between entropy-regularized RL and research in non-equilibrium statistical mechanics focusing on Markovian processes conditioned on rare events. In the long-time limit, we apply approaches from large deviation theory to derive exact analytical results for the optimal policy and optimal dynamics in Markov Decision Process (MDP) models of reinforcement learning. The results obtained lead to a novel analytical and computational framework for entropy-regularized RL which is validated by simulations. The mapping established in this work connects current research in reinforcement learning and non-equilibrium statistical mechanics, thereby opening new avenues for the application of analytical and computational approaches from one field to cutting-edge problems in the other.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Physical Review Research	Publication Date: May 10, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Entropy regularized reinforcement learning using large deviation theory

Abstract

Talk to us

Similar Papers

More From: Physical Review Research

Lead the way for us

Similar Papers

Learning to rank with click-through features in a reinforcement learning framework
Amir Hosein Keyhanipour ... Farhad Oroumchian
International Journal of Web Information Systems | VOL. 12
Amir Hosein Keyhanipour, et. al.Amir Hosein Keyhanipour ... Farhad Oroumchian
07 Nov 2016
International Journal of Web Information Systems | VOL. 12

SeaRank: relevance prediction based on click models in a reinforcement learning framework
Amir Hosein Keyhanipour ... Farhad Oroumchian
Data Technologies and Applications | VOL. 57
Amir Hosein Keyhanipour, et. al.Amir Hosein Keyhanipour ... Farhad Oroumchian
08 Sep 2022
Data Technologies and Applications | VOL. 57

Research on Portfolio Optimization Models Using Deep Deterministic Policy Gradient
Li Wei ... Zhang Weiwei
-
Li Wei, et. al.Li Wei ... Zhang Weiwei
01 Nov 2020
01 Nov 2020

Author response: Associability-modulated loss learning is increased in posttraumatic stress disorder
Vanessa M Brown ... John M Wang
-
Vanessa M Brown, et. al.Vanessa M Brown ... John M Wang
19 Oct 2017
19 Oct 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Entropy regularized reinforcement learning using large deviation theory

Abstract

Talk to us

Similar Papers

More From: Physical Review Research