AN EMPIRICAL STUDY OF POTENTIAL-BASED REWARD SHAPING AND ADVICE IN COMPLEX, MULTI-AGENT SYSTEMS

Sam Devlin,Daniel Kudenko,Marek Grześ

doi:10.1142/s0219525911002998

Sam Devlin, Daniel Kudenko + Show 1 more

Open Access

https://doi.org/10.1142/s0219525911002998

Copy DOI

Abstract

This paper investigates the impact of reward shaping in multi-agent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory, potential-based reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We demonstrate empirically the performance of reward shaping in two problem domains within the context of RoboCup KeepAway by designing three reward shaping schemes, encouraging specific behaviour such as keeping a minimum distance from other players on the same team and taking on specific roles. The results illustrate that reward shaping with multiple, simultaneous learning agents can reduce the time needed to learn a suitable policy and can alter the final group performance.

Full Text