ABSTRACT The performance in multi-agent reinforcement learning (MARL) scenarios has usually been analysed in homogeneous teams with a few choices for the sociality regime (selfish, egalitarian, or altruistic). In this paper we analyse both homogeneous and heterogeneous teams in a variation of sociality regimes in the predator-prey game, using a novel normalisation of the weights so that the sum of all rewards is independent of the sociality regime. We find that the selfish regime is advantageous for both predator and prey teams, and for both homogeneous and heterogeneous teams. In particular, rewards are about 100% higher for the predator team when switching from the egalitarian to selfish regime and more than 400% higher from the altruistic regime. For the prey, the increase is around 40% and 100% respectively. The results are similar for homogeneous and heterogeneous situations. The takeaway message is that any study of homogeneous and heterogeneous cooperative-competitive multi-agent reinforcement learning teams should also take into account the sociality regimes before making conclusions on the preference of any algorithm.
Read full abstract