Abstract

The goals of perturbation analysis (PA), Markov decision processes (MDPs), and reinforcement learning (RL) are common: to make decisions to improve the system performance based on the information obtained by analyzing the current system behavior. In this paper, we study the relations among these closely related fields. We show that MDP solutions can be derived naturally from performance sensitivity analysis provided by PA. Performance potential plays an important role in both PA and MDPs; it also offers a clear intuitive interpretation for many results. Reinforcement learning, TD(A), neuro-dynamic programming, etc, are efficient ways of estimating the performance potentials and related quantities based on sample paths. This new view of PA, MDPs and RL leads to the gradient-based policy iteration method that can be applied to some nonstandard optimization problems such as those with correlated actions. Sample path-based approaches are also discussed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call