Abstract

Experience replay is a significant method of off-policy reinforcement learning (RL), which makes RL reuse the past experience and reduce the correlation between samples. Multi-Actor-Attention-Critic (MAAC) is a successful off-policy multi-agent reinforcement learning algorithm, due to its good scalability. To accelerate convergence, we use prioritized experience replay (PER) to optimize the experience selection in MAAC, and propose the PER-MAAC algorithm. In the PER-MAAC, the priority metric is based on the temporal-difference error during training. The algorithm is evaluated in the scenarios of Multi-UAV Cooperative Navigation and Rover-Tower. The experimental results show that PER-MAAC improves the speed of convergence.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call