Abstract
As a ubiquitous on-policy reinforcement learning algorithm, proximal policy optimization (PPO) has achieved the state-of-the-art performance in both single-agent and cooperative multi-agent scenarios. However, it still suffers from the instability and inefficiency of the policy optimization with the non-strictly restricted likelihood-ratio in clipping strategy. In this work, we propose an activation likelihood-ratio (ALR) for solving this issue. The ALR is restricted by a tanh activation function, and it can be employed in multiple functional clipping strategies. The resulted ALR clipping strategy produces a smooth but precipitous objective curve, which can provide high policy update stationarity and efficiency. The ALR clipping strategy is incorporated into the PPO loss function, thus resulting in the method proximal policy optimization with activation likelihood-ratio (PPO-ALR). The rationality and superiority of the ALR-based target function are proved and analyzed. Moreover, experiments on the Pistonball cooperative multi-agent game show that PPO-ALR produces competitive and superior results compared with the standard PPO, PPO with rollback, and PPO smoothed algorithms, especially its high efficiency and success probability in searching optimal policies in multi-agent environments.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have