In many real-world scenarios, multiple agents necessitate coordination with each other because of their limited observation and communication capability. Deep multi-agent reinforcement learning has demonstrated significant success in such challenging settings making use of value decomposition. One of the representative methods is QMIX, which factorizes the multi-agent global Q-value into individual Q-values and limits the joint action Q-value to a monotonic assumption leveraging an implicit mixing method. However, this assumption restricts it to representing certain value functions in which the ordering of an agent’s actions is based on the actions of others. WQMIX presents two weighting schemes to tackle this restriction but the weighting function is simple that limits the performance of methods, more appropriate weighting scheme is required to be considered. To tackle this issue, we present a more complex and accurate weighting scheme, which we call Dynamic Weighting (DW), as opposed to the fixed weighting in WQMIX. Our proposed method DW-QMIX guarantees a more general decomposition than QMIX or WQMIX and places accurate importance on the better joint actions thus leading to obtaining the optimal policy. Extensive experiments on the simulation environments and real-life systems demonstrate that our proposed method outperforms the existing multi-agent reinforcement learning methods.
Read full abstract