Simultaneous perturbation algorithms for batch off-policy search

Raphael Fonteneau,L A Prashanth

doi:10.1109/cdc.2014.7039790

Simultaneous perturbation algorithms for batch off-policy search

Raphael Fonteneau, L A Prashanth

Open Access

https://doi.org/10.1109/cdc.2014.7039790

Copy DOI

Publication Date: Dec 1, 2014

Citations: 19

Affiliation: Inria research centre Lille - Nord Europe

#Batch Mode Reinforcement Learning #Policy Search Algorithms + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We propose novel policy search algorithms in the context of off-policy, batch mode reinforcement learning (RL) with continuous state and action spaces. Given a batch collection of trajectories, we perform off-line policy evaluation using an algorithm similar to that by [Fonteneau et al., 2010]. Using this Monte-Carlo like policy evaluator, we perform policy search in a class of parameterized policies. We propose both first order policy gradient and second order policy Newton algorithms. All our algorithms incorporate simultaneous perturbation estimates for the gradient as well as the Hessian of the cost-to-go vector, since the latter is unknown and only biased estimates are available. We demonstrate their practicality on a simple 1-dimensional continuous state space problem.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.