Abstract

Abstract The proximal policy optimization (PPO) algorithm is a promising algorithm in reinforcement learning. In this paper, we propose to add an action mask in the PPO algorithm. The mask indicates whether an action is valid or invalid for each state. Simulation results show that, when compared with the original version, the proposed algorithm yields much higher return with a moderate number of training steps. Therefore, it is useful and valuable to incorporate such a mask if applicable.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call