Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning

Miloš S Stanković,Marko Beko,Nemanja Ilić,Srdjan S Stanković

doi:10.1016/j.ejcon.2023.100853

Abstract

In this paper a new distributed multi-agent Actor-Critic algorithm for reinforcement learning is proposed for solving multi-agent multi-task optimization problems. The Critic algorithm is in the form of a Distributed Emphatic Temporal Difference DETD(λ) algorithm, while the Actor algorithm is proposed as a complementary consensus based policy gradient algorithm, derived from a global objective function having the role of a scalarizing function in multi-objective optimization. It is demonstrated that the Feller-Markov properties hold for the newly derived Actor algorithm. A proof of the weak convergence of the algorithm to the limit set of an attached ODE is derived under mild conditions, using a specific decomposition between the Critic and the Actor algorithms and additional two-time-scale stochastic approximation arguments. An experimental verification of the algorithm properties is given, showing that the algorithm can represent an efficient tool for practice.

Full Text