Abstract

In conjunction with the problem of transforming a given optimization problem into a form from which the functional equations of dynamic programming are obtainable, Karp and Held (1967) made clear the relation between a certain class of decision processes and dynamic programming from the view point of automata theory. This paper also follows the line of Karp and Held, and presents a number of new concepts. First we assume that a given optimization problem is discrete and deterministic: it is given in the form of discrete decision process (ddp). Then we define six classes of decision processes: sdp (sequential decision process), msdp (monotone sdp), smsdp (strictly monotone sdp), pmsdp (positively monotone sdp), ap (additive process), and lmsdp (loop-free msdp). The sdp is considered as a general model of a decision process with finite states. The msdp is a subclass of sdp's from which the functional equations of dynamic programming are obtainable. The smsdp, pmsdp, ap, and lmsdp are subclasses of msdp's, which have simpler structures than that of msdp. In fact, simpler solution methods for solving the resulting functional equations are available for these subclasses. Two types of representation theorems are first proved for each class of decision processes: one is the w (weak)-representation theorem which is a necessary and sufficient condition for a given ddp to be realized by a decision process of the specific class in the sense that both have the same set of optimal policies, and the other is the s (strong)-representation theorem, which assumes the coincidence of cost value for each feasible policy in addition to the above condition. Based on the w -representation theorems, various properties of sets of optimal policies are investigated for each class. In particular, it is shown that although sets of optimal policies of sdp and msdp are not closed under most of operations, they are closed for smsdp, pmsdp, ap, and lmsdp. In fact, a set of policies can be a set of optimal policies of an smsdp, pmsdp, or ap if and only if it is regular (i.e., accepted by a finite automaton). For an lmsdp, a set can be a set of optimal policies if and only if it is finite.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call