Abstract

AbstractWe recently proposed swarm reinforcement learning methods in which multiple agents are prepared and they learn not only by individual learning but also by learning through exchanging information among the agents. The methods have been applied to a problem in discrete state-action space so far, and Q-learning method has been used as the individual learning. Although many studies in reinforcement learning have been done for problems in the discrete state-action space, continuous state-action space is required for coping with most real-world tasks. This paper proposes a swarm reinforcement learning method based on an actor-critic method in order to acquire optimal policies rapidly for problems in the continuous state-action space. The proposed method is applied to an inverted pendulum control problem, and its performance is examined through numerical experiments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call