Optimized look‐ahead tree policies: a bridge between look‐ahead tree policies and direct policy search

Tobias Jung,Francis Maes,Damien Ernst,Louis Wehenkel

doi:10.1002/acs.2387

Tobias Jung, Francis Maes + Show 2 more

Open Access

https://doi.org/10.1002/acs.2387

Copy DOI

Abstract

SUMMARYDirect policy search (DPS) and look‐ahead tree (LT) policies are two popular techniques for solving difficult sequential decision‐making problems. They both are simple to implement, widely applicable without making strong assumptions on the structure of the problem, and capable of producing high‐performance control policies. However, computationally, both of them are, each in their own way, very expensive. DPS can require huge offline resources (effort required to obtain the policy) to first select an appropriate space of parameterized policies that works well for the targeted problem and then to determine the best values of the parameters via global optimization. LT policies do not require any offline resources; however, they typically require huge online resources (effort required to calculate the best decision at each step) in order to grow trees of sufficient depth. In this paper, we propose optimized LTs (OLTs), a model‐based policy learning scheme that lies at the intersection of DPS and LT. In OLT, the control policy is represented indirectly through an algorithm that at each decision step develops, as in LT by using a model of the dynamics, a small LT until a prespecified online budget is exhausted. Unlike LT, the development of the tree is not driven by a generic heuristic; rather, the heuristic is optimized for the target problem and implemented as a parameterized node scoring function learned offline via DPS. We experimentally compare OLT with pure DPS and pure LT variants on optimal control benchmark domains. The results show that the LT‐based representation is a versatile way of compactly representing policies in a DPS scheme (which results in OLT being easier to tune and having lower offline complexity than pure DPS) and at the same time DPS helps to significantly reduce the size of the LTs that are required to take high‐quality decisions (which results in OLT having lower online complexity than pure LT). Moreover, OLT produces overall better performing policies than pure DPS and pure LT, and also results in policies that are robust with respect to perturbations of the initial conditions. Copyright © 2013 John Wiley & Sons, Ltd.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Adaptive Control and Signal Processing	Publication Date: Feb 11, 2013
Citations: 58	License type: other-oa

R Discovery Prime

R Discovery Prime

Optimized look‐ahead tree policies: a bridge between look‐ahead tree policies and direct policy search

Abstract

Talk to us

Similar Papers

More From: International Journal of Adaptive Control and Signal Processing

Lead the way for us

Similar Papers

Optimized look-ahead trees: Extensions to large and continuous action spaces
Tobias Jung ... Damien Ernst
-
Tobias Jung, et. al.Tobias Jung ... Damien Ernst
01 Apr 2013
01 Apr 2013

Neuro-Evolutionary Direct Policy Search for Multiobjective Optimal Control
Marta Zaniolo ... Matteo Giuliani
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33
Marta Zaniolo, et. al.Marta Zaniolo ... Matteo Giuliani
01 Oct 2022
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33

Instance-based Policy Learning by Real-coded Genetic Algorithms and Its Application to Control of Nonholonomic Systems
Atsushi Miyamae ... Isao Ono
Transactions of the Japanese Society for Artificial Intelligence | VOL. 24
Atsushi Miyamae, et. al.Atsushi Miyamae ... Isao Ono
01 Jan 2009
Transactions of the Japanese Society for Artificial Intelligence | VOL. 24

Jointly Learning Environments and Control Policies with Projected Stochastic Gradient Ascent
Adrien Bolland ... Damien Ernst
Journal of Artificial Intelligence Research | VOL. 73
Adrien Bolland, et. al.Adrien Bolland ... Damien Ernst
05 Jan 2022
Journal of Artificial Intelligence Research | VOL. 73

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimized look‐ahead tree policies: a bridge between look‐ahead tree policies and direct policy search

Abstract

Talk to us

Similar Papers

More From: International Journal of Adaptive Control and Signal Processing