Abstract

Value estimation is a critical problem in Value-Based reinforcement learning. Most related studies focus on using multi-critic to reduce estimation bias and seldom consider the multi-actor impact on value estimation. This paper proposes a multi-actor mechanism (MAM) for Actor-Critic reinforcement learning that can provide multiple behavior choices in the same state, resulting in diverse Q-values that provide richer information and enhance exploration capability. MAM contains two technologies. One is obsolescence technology, which quickly generates high-quality experience to help the agent find the optimal policy. The other is Q-value weighting technology, which leverages multiple Q-values to achieve a more accurate value estimation. The proposed mechanism, MAM, is general and can be applied to any Actor-Critic reinforcement learning algorithm. Specifically, we embed MAM into DDPG and TD3 and demonstrate that MAM can mitigate estimation bias, enhance exploration, and yield state-of-the-art results in various MuJoCo tasks, including the challenging Humanoid-v2 and Walker2d-v2 benchmarks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call