Multi-Agent Reinforcement Learning via Adaptive Kalman Temporal Difference and Successor Representation

Mohammad Salimibeni,Parvin Malekzadeh,Konstantinos N Plataniotis,Arash Mohammadi

doi:10.3390/s22041393

Mohammad Salimibeni, Parvin Malekzadeh + Show 2 more

Open Access

https://doi.org/10.3390/s22041393

Copy DOI

Abstract

Development of distributed Multi-Agent Reinforcement Learning (MARL) algorithms has attracted an increasing surge of interest lately. Generally speaking, conventional Model-Based (MB) or Model-Free (MF) RL algorithms are not directly applicable to the MARL problems due to utilization of a fixed reward model for learning the underlying value function. While Deep Neural Network (DNN)-based solutions perform well, they are still prone to overfitting, high sensitivity to parameter selection, and sample inefficiency. In this paper, an adaptive Kalman Filter (KF)-based framework is introduced as an efficient alternative to address the aforementioned problems by capitalizing on unique characteristics of KF such as uncertainty modeling and online second order learning. More specifically, the paper proposes the Multi-Agent Adaptive Kalman Temporal Difference (MAK-TD) framework and its Successor Representation-based variant, referred to as the MAK-SR. The proposed MAK-TD/SR frameworks consider the continuous nature of the action-space that is associated with high dimensional multi-agent environments and exploit Kalman Temporal Difference (KTD) to address the parameter uncertainty. The proposed MAK-TD/SR frameworks are evaluated via several experiments, which are implemented through the OpenAI Gym MARL benchmarks. In these experiments, different number of agents in cooperative, competitive, and mixed (cooperative-competitive) scenarios are utilized. The experimental results illustrate superior performance of the proposed MAK-TD/SR frameworks compared to their state-of-the-art counterparts.

Highlights

Reinforcement Learning (RL), as a class of Machine Learning (ML) techniques, targets providing human-level adaptive behavior by construction of an optimal control policy [1].Generally speaking, the main underlying objective is learning from previous interactions of an autonomous agent and its surrounding environment
Novelty: The novelty of the proposed frameworks lies in the integration of Kalman temporal different, multiple-model adaptive estimation, and successor representation for Multi-Agent RL (MARL) problems
The agents are trained over different number of episodes, after which 10 iteration each of 1000 episodes is implemented for testing to compute different results evaluating performance and efficiency of the proposed Multi-Agent Adaptive Kalman Temporal Difference (MAK-Temporal Difference (TD))/Successor Representation (SR) frameworks

Summary

Introduction

Reinforcement Learning (RL), as a class of Machine Learning (ML) techniques, targets providing human-level adaptive behavior by construction of an optimal control policy [1].Generally speaking, the main underlying objective is learning (via trial and error) from previous interactions of an autonomous agent and its surrounding environment. Reinforcement Learning (RL), as a class of Machine Learning (ML) techniques, targets providing human-level adaptive behavior by construction of an optimal control policy [1]. The optimal control (action) policy can be obtained via RL algorithms through the feedback that environment provides to the agent after each of its actions [2–9]. In most of the successful RL applications, e.g., Go and Poker games, robotics, and autonomous driving, typically, several autonomous agents are involved. This naturally falls within the context of Multi-Agent RL (MARL), which is a relatively long-established domain; it has recently been revitalized due to the advancements made in the single-agent

Objectives

Results

Discussion

Conclusion