Abstract

Gradient based policy search algorithms benefit largely from the availability of a properly estimated state or state-action value function which can be used to reduce the variance of the gradient estimates. Additionally the use of Gaussian processes for value function approximation provides a fully probabilistic model where - using the uncertainty in the estimated value function - we can assess the amount of exploration required. In this article we present two modalities for adjusting different characteristics of the exploration in on-line learning of control policies for problems with continuous state-action spaces. The proposed methods exploit the fully probabilistic nature of the Gaussian processes and aims to constrain the exploration only to relevant subspaces, thereby speeding up convergence. We present experiments on a simulated control task to demonstrate the validity of our algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.