The reinforcement learning agent has been very successful in many Atari 2600 games. However, while applied to a more complex and challenging environment, it is crucial to avoid falling into the local optimum, especially when the games contain many traps, ample action space, challenging scenarios, and sporadic successful episodes. In this case, using the intrinsic motivation method can easily fall into the local optimum. If the domain knowledge is excessively used, it is not applicable when encountering different game designs. Therefore, to enhance the agent’s ability to explore and avoid catastrophic forgetting due to the fades of intrinsic motivation, a Trajectory Evaluation Module is developed and integrated with ideas from the Count-Based Exploration and Trajectory Replay method.Moreover, our approach is integrated very well with the Self Imitation Learning method and works effectively for hard-exploration video games. Our policy is also evaluated with two video games: Super Mario Bros and Sonic the Hedgehog. The experiment results show that our Trajectory Evaluation Module can help the agent pass through various obstacles and scenarios, and successfully break through all levels of Super Mario Bros.
Read full abstract