Abstract

For a given vectorx0, the sequence {xt} which optimizes the sum of discounted rewardsr(xt, xt+1), wherer is a quadratic function, is shown to be generated by a linear decision rulext+1=Sxt+R. Moreover, the coefficientsR,S are given by explicit formulas in terms of the coefficients of the reward functionr. A unique steady-state is shown to exist (except for a degenerate case), and its stability is discussed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call