Abstract

Stochastic automata models have been successfully used in the past for modeling learning systems. An automaton model with a variable structure reacts to inputs from a random environment by changing the probabilities of its actions. These changes are carried out using a reinforcement scheme in such a manner that the automaton evolves to a final structure which is satisfactory in some sense. Several reinforcement schemes have been proposed in the literature for updating the structure of automata [1–4]. Most of these are expedient schemes which in the limit yield structures which are better than a device that chooses the actions with equal probabilities irrespective of the environment's response. A few schemes have also been suggested recently which in the limit lead to a continuous selection of a single optimal action as the output of the automaton, when it operates in a stationary environment and are called optimal schemes [5–7]. The question naturally arises as to which of the schemes are to be preferred ...

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call