Abstract

In multi-agent reinforcement learning (MARL), the behaviors of each agent can influence the learning of others, and the agents have to search in an exponentially enlarged joint-action space. Hence, it is challenging for the multi-agent teams to explore in the environment. Agents may achieve suboptimal policies and fail to solve some complex tasks. To improve the exploring efficiency as well as the performance of MARL tasks, in this paper, we propose a new approach by transferring the knowledge across tasks. Differently from the traditional MARL algorithms, we first assume that the reward functions can be computed by linear combinations of a shared feature function and a set of task-specific weights. Then, we define a set of basic MARL tasks in the source domain and pre-train them as the basic knowledge for further use. Finally, once the weights for target tasks are available, it will be easier to get a well-performed policy to explore in the target domain. Hence, the learning process of agents for target tasks is speeded up by taking full use of the basic knowledge that was learned previously. We evaluate the proposed algorithm on two challenging MARL tasks: cooperative box-pushing and non-monotonic predator-prey. The experiment results have demonstrated the improved performance compared with state-of-the-art MARL algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call