Multi-Agent Reward-Iteration Fuzzy Q-Learning

Lixiong Leng,Haobin Shi,Kao-Shing Hwang,Jingchen Li,Jinhui Zhu

doi:10.1007/s40815-021-01063-4

Abstract

Fuzzy Q-learning extends Q-learning to continuous state space and has been applied to a wide range of applications such as robot control. But in a multi-agent system, the non-stationary environment makes joint policy challenging to converge. To give agents more suitable rewards in a multi-agent environment, a multi-agent reward-iteration fuzzy Q-learning (RIFQ) is proposed for multi-agent cooperative tasks. The state space is divided into three channels by the proposed state-divider with fuzzy logic. The reward of an agent is reshaped iteratively according to its state, and the update sequence is constructed by calculating the relation among states of different agents. Then, the value functions are updated top-down. By replacing the reward given by the environment with the reshaped reward, agents can avoid the most unreasonable punishments and receive rewards selectively. RIFQ provides a feasible reward relationship for multi-agents, which makes the training of multi-agent more steady. Several simulation experiments show that RIFQ is not limited by the number of agents and has a faster convergence speed than baselines.

Full Text