From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning

Xi-Ren Cao

doi:10.1023/a:1022188803039

Abstract

The goals of perturbation analysis (PA), Markov decision processes (MDPs), and reinforcement learning (RL) are common: to make decisions to improve the system performance based on the information obtained by analyzing the current system behavior. In this paper, we study the relations among these closely related fields. We show that MDP solutions can be derived naturally from performance sensitivity analysis provided by PA. Performance potential plays an important role in both PA and MDPs; it also offers a clear intuitive interpretation for many results. Reinforcement learning, TD(λ), neuro-dynamic programming, etc., are efficient ways of estimating the performance potentials and related quantities based on sample paths. The sensitivity point of view of PA, MDP, and RL brings in some new insight to the area of learning and optimization. In particular, gradient-based optimization can be applied to parameterized systems with large state spaces, and gradient-based policy iteration can be applied to some nonstandard MDPs such as systems with correlated actions, etc. Potential-based on-line approaches and their advantages are also discussed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning

Abstract

Talk to us

Similar Papers

More From: Discrete Event Dynamic Systems

Lead the way for us

Journal: Discrete Event Dynamic Systems	Publication Date: Jan 1, 2003
Citations: 43

Similar Papers

A Sensitivity View of Markov Decision Processes and Reinforcement Learning
Xi-Ren Cao
-
Xi-Ren CaoXi-Ren Cao
01 Jan 2003
01 Jan 2003

Gradient-Based Learning and Optimization
Xi-Ren Cao
-
Xi-Ren CaoXi-Ren Cao
14 Sep 2022
14 Sep 2022

Regularization in reinforcement learning
...
-
, et. al. ...
01 Jan 2010
01 Jan 2010

Evaluation of Optimal Resource Management Policies for WiMAX Networks with AMC: A Reinforcement Learning Approach
Adam Flizikowski ... Marcin Przybyszewski
-
Adam Flizikowski, et. al.Adam Flizikowski ... Marcin Przybyszewski
01 Jan 2010
01 Jan 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning

Abstract

Talk to us

Similar Papers

More From: Discrete Event Dynamic Systems