Abstract

Learning automaton (LA) is a powerful tool for reinforcement learning in the field of Artificial Intelligence. In the evaluation of LA, it has always been a key issue how to trade off between “accuracy” and “speed”, which substantially involves in parameter tuning. Owing to the environmental randomness and complexity, it universally takes millions of interactions during the process of parameter tuning, bringing about a tremendous expense. To avoid this fatal flaw, in this paper, a novel parameterless learning automaton named LFPLA is proposed. It shows an intriguing property of not relying on manually configured parameters and possesses an ϵ-optimality property in any stationary random environment. A distinctive innovation lies in a newly defined loss function, which replaces the probability vector maintaining in conventional LA. Furthermore, a series of sampling strategies are designed for action selection, and in terms of iteration termination conditions, a sufficiently small threshold is employed. In addition to proving its advantageous performance theoretically by detailed mathematical proofs, we also carried out extensive experiments to illustrate the effectiveness in two-action as well as multi-action benchmark environments, by means of Monte Carlo simulations. The proposed LFPLA converges faster with a higher accuracy than the only parameter-free LA presently: GBLA. Moreover, it is superior to the state of the arts in multi-action LA, especially in complex and confuse environments. Most of all, it embodies a unique and extraordinary benefit with special regard to either tuning cost or interaction cost.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call