Abstract

Three optimal control methodologies all relying on neural network for their universal approximation capabilities and on dynamic programming for substituting the time-integral optimization by a succession of time-local optimizations are presented in this paper and applied on the same elementary rendezvous problem. First a simplified version of the backpropagation-through-time algorithm is presented as the most faithful implementation of dynamic programming when the optimal controller is approximated by a neural network (learning by gradient descent) and the process model is available. Relaxing the need for an explicit prior modelling of the process model, reinforcement learning (RL) approaches, both for continuous and discrete controllers, are described and tested on the rendezvous problem. The results and the numerous methodological difficulties we met are discussed. The most successful reinforcement learning is the connectionist implementation of Q-learning with all Q-values approximated by radial-basis-function networks. However when searching for a continuous optimal controller, the price RL has to pay for the absence of model turns out to be far from negligible in terms of methodological difficulties, lack of robustness, convergence time and quality of the discovered solution.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.