Research on Tensor-Based Cooperative and Competitive in Multi-Agent Reinforcement Learning

Tsega Weldu Araya,A P Yuan Ling,Md Rashed Ibn Nawab

doi:10.24018/ejece.2020.4.6.262

Abstract

As technology overgrows, the assortment of information and the density of work becomes demanding to manage. To resolve the density of employment and human labor, machine-learning (ML) technology developed. Reinforcement learning (RL) is the recent advancement of ML studies. Multi-agent reinforcement learning (MARL) is useful to train multiple agents in the surrounding environment. The previous research studies focused on two-agent cooperation. Their data representation was held in a two-dimensional array, which is called a matrix. The limitation of this two-dimensional array appears as the training data of agents increases. The growth in the training data of agents creates storage drawbacks and data redundancy. Our first aim in this research is to improve an algorithm that can represent MARL training in tensor. In MARL, multiple agents are work together to achieve joint work. To share the training records and data of numerous agents, we need to collect the previous cumulative experience of agents in tensor.  Secondly, we will discover the agent's cooperation and competition, with local and global goals of agents in MARL. Local goals are the cooperation of agents in a group or team where we use the training model as a student and teacher agent. The global goal is the competition between two contrary teams to acquire the reward. All learning agents have their Q table for storing the individual agent's training data in an environment. The growth in the number of learning agents, their training experience in Q tables, and the requirement for representing multiple data become the most challenging issue. We introduce tensor to store various data to resolve the challenges for data representation in multiple agent associations. Tensor is expressed as the three-dimensional array, although it is an N-way array, which is useful for representing and accessing numerous data.  Finally, we will implement an algorithm for learning three cooperative agents against the opposed team using a tensor-based framework in the Q learning algorithm. We will provide an algorithm that can store the training records and data of multiple agents. Tensor advances to get a small storage size than the matrix for the training records of agents. Although three agent cooperation benefits to having maximum optimal reward.

Highlights

Reinforcement learning is a framework in which an agent learns by trial and error with the surrounding environment's interaction
Markov game is a stochastic game developed to resolve the limitation of Markov decision process (MDP) and the theory of game designed for multi-agents
As we know, Reinforcement learning (RL) is the arena of machine learning used to solve challenging problems; most successful works have been done on learning a single agent

Summary

Introduction

Reinforcement learning is a framework in which an agent learns by trial and error with the surrounding environment's interaction. Artificial Intelligence (AI) is a portion of computer science allocating through the simulation of intelligent presentation in a computer It is a technology [9], making it possible for a machine to acquire experience by adjusting the input and executing human-like tasks. It involves observing the characteristics of human beings' intelligence and relating them as algorithms in a computer to make a machine related to humans compared to their intelligence. RL allows machine and software agents to learn how to behave in an environment by performing actions and getting the related maximum reward. Its learning steps are to see the current state S, choose an action A and execute it, after that the agent will receive instant reward R; the agent perceives the new state S'. The RL was used in a single-agent learning environment, but the number of agents opens new learning methods

Objectives

Methods

Results