Abstract

It is known that policy iteration can be identified with Newton's method (and value iteration with successive approximation) for solving Bellman's optimality equation, which for Markov decision problems takes the form: for i=1, N F(v(i))=max/sub u//spl isin/U/[r~/sub u/(i)+/spl alpha//spl Sigma//sub j=1//sup N/P/sub u/(i,j)v(j)]-v(i)=0. One is naturally led to consider what new computational methods might be suggested by adopting alternative root-finding procedures. This paper summarises an investigation of the consequences of adopting one particular root-finding scheme called the method of continuity (also known as the method of imbedding or homotopy).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call