Abstract

Hierarchical reinforcement learning (HRL) is a promising approach to solving long-term decision problems and complex tasks, as high-level policy can guide the training procedure of low-level policy with macro actions and intrinsic rewards. However, the amount that macro actions influence decision-making, which affects how much internal rewards should be given to low-level policy, is disregarded by current HRL algorithms. It may be reasonable to provide low-level policy with less intrinsic rewards if macro actions are less important in decision-making. In this paper, we propose a value decomposition based hierarchical multi-agent reinforcement learning method with intrinsic reward rectification, which can determine the effectiveness of macro actions and correct the intrinsic rewards. We show that our proposed method significantly outperforms the state-of-the-art value decomposition approaches on the StarCraft Multi-Agent Challenge platform.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call