If the state transitions of a nondeterministic or stochastic automaton are rewarded, the question arises whether or not the automaton can adopt a policy which makes sure that this return is maximal or nearly maximal. This problem is of interest, e.g., if one wants to find an optimal prediction for the next state of a stochastic automaton, or if optimal learning strategies are looked for, when optimality is measured in terms of a given goal of learning. It is shown in this paper that under some mildly restricted conditions such optimal or nearly optimal state transition policies exist. This is done for stochastic automata. By means of a representation of nondeterministic by stochastic automata—a result which seems to be of interest by itself—this carries over to the non-deterministic case. The methods and main auxiliary results come from the theory of set valued maps.
Read full abstract