The emergence of collective order in swarms from local, myopic interactions of their individual members is of interest to biology, sociology, psychology, computer science, robotics, physics and economics. Cooperative swarms, whose members unknowingly work towards a common goal, are particularly perplexing: members sometimes take individual actions that maximize collective utility, at the expense of their own. This seems to contradict expectations of individual rationality. Moreover, members choose these actions without knowing their effect on the collective utility. I examine this puzzle through game theory, machine learning and robots. I show that in some settings, the collective utility can be transformed into individual rewards that can be measured locally: when interacting, members individually choose actions that receive a reward based on how quickly the interaction was resolved, how much individual work time is gained and the approximate effect on others. This internally measurable reward is individually and independently maximized by learning. This results in a equilibrium, where the learned response of each individual maximizes both its individual reward and the collective utility, i.e. both the swarm and the individuals are rational.This article is part of the theme issue 'The road forward with swarm systems'.
Read full abstract