Abstract

There are numerous applications, in many different fields, of denumerable controlled Markov chain (CMC) models with an infinite planning horizon; see Bertsekas (1987), Ephremides and Verdu (1989), Ross (1983), Stidham and Weber (1993), and Tijms (1986). The authors consider the stochastic control problem of maximizing average rewards in the long-run, for denumerable CMCs. Departing from the most common position which uses expected values of rewards, the authors focus on a sample path analysis of the stream of states and actions. Under a Lyapunov function condition, the authors show that stationary policies obtained from the average reward optimality equation are not only expected average reward optimal, but indeed sample path average reward optimal. >

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.