Parameterized Projected Bellman Operator

Théo Vincent,Marcello Restelli,Carlo D'Eramo,Alberto Maria Metelli,Jan Peters,Boris Belousov

doi:10.1609/aaai.v38i14.29465

Abstract

Approximate value iteration (AVI) is a family of algorithms for reinforcement learning (RL) that aims to obtain an approximation of the optimal value function. Generally, AVI algorithms implement an iterated procedure where each step consists of (i) an application of the Bellman operator and (ii) a projection step into a considered function space. Notoriously, the Bellman operator leverages transition samples, which strongly determine its behavior, as uninformative samples can result in negligible updates or long detours, whose detrimental effects are further exacerbated by the computationally intensive projection step. To address these issues, we propose a novel alternative approach based on learning an approximate version of the Bellman operator rather than estimating it through samples as in AVI approaches. This way, we are able to (i) generalize across transition samples and (ii) avoid the computationally intensive projection step. For this reason, we call our novel operator projected Bellman operator (PBO). We formulate an optimization problem to learn PBO for generic sequential decision-making problems, and we theoretically analyze its properties in two representative classes of RL problems. Furthermore, we theoretically study our approach under the lens of AVI and devise algorithmic implementations to learn PBO in offline and online settings by leveraging neural network parameterizations. Finally, we empirically showcase the benefits of PBO w.r.t. the regular Bellman operator on several RL problems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Parameterized Projected Bellman Operator

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Mar 24, 2024
Citations: 1

Similar Papers

Analyzing Approximate Value Iteration Algorithms
Arunselvan Ramaswamy ... Shalabh Bhatnagar
Mathematics of Operations Research | VOL. 47
Arunselvan Ramaswamy, et. al.Arunselvan Ramaswamy ... Shalabh Bhatnagar
30 Dec 2021
Mathematics of Operations Research | VOL. 47

On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning
D P De Farias ... B Van Roy
Journal of Optimization Theory and Applications | VOL. 105
D P De Farias, et. al.D P De Farias ... B Van Roy
01 Jun 2000
Journal of Optimization Theory and Applications | VOL. 105

Approximate dynamic programming via direct search in the space of value function approximations
E.F Arruda ... J.B.R Do Val
European Journal of Operational Research | VOL. 211
E.F Arruda, et. al.E.F Arruda ... J.B.R Do Val
13 Jan 2011
European Journal of Operational Research | VOL. 211

Subgoal Identifications in Reinforcement Learning: A Survey
Chung-Cheng Chiu ... Von-Wun Soo
-
Chung-Cheng Chiu, et. al.Chung-Cheng Chiu ... Von-Wun Soo
14 Jan 2011
14 Jan 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Parameterized Projected Bellman Operator

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence