Abstract

A stochastic automaton responds to the penalties from a random environment through a reinforcement scheme by changing its state probability distribution in such a way as to reduce the average penalty received. In this manner the automaton is said to possess a variable structure and the ability to learn. This paper discusses the efficiency of learning for an m-state automaton in terms of expediency and convergence, under two distinct types of reinforcement schemes: one based on penalty probabilities and the other on penalty strengths. The functional relationship between the successive probabilities in the reinforcement scheme may be either linear or nonlinear. The stability of the asymptotic expected values of the state probability is discussed in detail. The conditions for optimal and expedient behavior of the automaton are derived. Reduction of the probability of suboptimal performance by adopting the Beta model of the mathematical learning theory is discussed. Convergence is discussed in the light of variance analysis. The initial learning rate is used as a measure of the overall convergence rate. Learning curves can be obtained by solving nonlinear difference equations relating the successive expected values. An analytic expression concerning the convergence behavior of the linear case is derived. It is shown that by a suitable choice of the reinforcement scheme it is possible to increase the separation of asymptotic state probabilities.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call