Abstract

Optimization theory provides a framework for determining the best decisions or actions with respect to some mathematical model of a process. This paper focuses on learning to act in a near-optimal manner through reinforcement learning for problems that either have no model or the model is too complex. One approach to solving this class of problems is via approximate dynamic programming. The application of these methods are established primarily for the case of discrete state and action spaces. In this paper we develop efficient methods of learning which act in complex systems with continuous state and action spaces. Monte-Carlo approaches are employed to estimate function values in an iterative, incremental procedure. Derivative-free line search methods are used to obtain a near-optimal action in the continuous action space for a discrete subset of the state space. This near-optimal control policy is then extended to the entire continuous state space via a fuzzy additive model. To compensate for approximation errors, a modified procedure for perturbing the generated control policy is developed. Convergence results under moderate assumptions and stopping criteria are established.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call