Abstract

Learning automata arranged in a two-level hierarchy are considered. The automata operate in a stationary random environment and update their action probabilities according to the linear-reward-ε-penalty algorithm at each level. Unlike some hierarchical systems previously proposed, no information transfer exists from one level to another, and yet the hierarchy possesses good convergence properties. Using weak-convergence concepts it is shown that for large time and small values of parameters in the algorithm, the evolution of the optimal path probability can be represented by a diffusion whose parameters can be computed explicitly.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call