Abstract

This paper is a study of decision making in a discrete-state discrete-time system whose state transitions constitute a Markov chain with unknown stationary transition matrix P. The states of the system cannot be observed. The decision at each stage is based on observables whose conditional probability distribution given the state of the system is known. We consider a class of problems in which the successive observations can be employed to form estimates of P, with the estimate at time n, n= 0, 1, 2, …, then used as a basis for making a decision at time n. The estimates and the corresponding decisions must have the property that as n→ ∞, the decision based on the estimate of P tends to the optimal decision rule which would be used throughout if P were known.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call