Abstract
Consider a countable state controlled Markov chain whose transition probability is specified up to an unknown parameter $\alpha $ taking values in a compact metric space A. To each $\alpha $ is associated a prespecified stationary control law $\zeta (\alpha )$. The adaptive control law selects at each time t the control action $\zeta (\alpha _t ,x_t )$ where $x_t$ is the state and $\alpha_t$ is the maximum likelihood estimate of $\alpha $. The asymptotic behavior of this control scheme is investigated for the cases when the true parameter value $\alpha_0 $ does or does not belong to A, and for the case when $\zeta $ is chosen to minimize an average cost criterion. The analysis uses an appropriate extension of the notions of recurrence to nonstationary Markov chains.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.