Abstract

Learning automaton is considered as one of the most potent tools in reinforcement learning. The family of estimator algorithms is proposed to improve the convergence rate of learning automaton and has made significant achievements. However, the estimators perform poorly on estimating actions’ reward probabilities in the initial stage of the learning process. In this situation, a lot of rewards would be assigned to nonoptimal actions. Thus, numerous extra iterations are required to compensate for these wrong rewards. To further improve the speed of convergence, we propose a new P-model absorbing learning automaton using a double competitive strategy to update the action probability vector. The proposed scheme overcomes the drawbacks of the existing action probability vector updating strategy. And, extensive experimental results in benchmark environments demonstrate that the proposed learning automata perform more effectively than the most classic learning automaton \(SE_{RI}\) and the current fastest learning automaton \(DGCPA^{*}\).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call