Abstract

Defining a reward function that, when optimized, results in a rapid acquisition of an optimal policy, is one of the most challenging problems involved when deploying reinforcement learning algorithms. The existing works on the optimal reward problem (ORP) propose mechanisms to design reward functions but their application is limited to specific sub-classes of single or multi-agent reinforcement learning problems. Moreover, these methods identify which rewards should be given in which situation, but not which aspects of the state or environment should be used when defining the reward function. Those methods also do not directly model how quickly an optimal policy can be learned by optimizing a given candidate reward function. In this paper, we define the extended optimal reward problem (EORP) which: i) can identify both reward features and reward weights that compose the reward function; ii) is general enough to deal with single and multi-agent reinforcement learning problems; iii) is scalable to problems with large number of agents learning simultaneously; iv) incorporates a learning effort metric in the evaluation of reward functions allowing the discovery of reward functions that result in faster learning. Experimental results on gridworld-like and traffic assignment scenarios are used to evaluate the efficiency of our approach in designing effective reward functions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call