Reward Value-Based Goal Selection for Agents’ Cooperative Route Learning Without Communication in Reward and Goal Dynamism

Fumito Uwano,Keiki Takadama

doi:10.1007/s42979-020-00191-2

Fumito Uwano, Keiki Takadama

Open Access

https://doi.org/10.1007/s42979-020-00191-2

Copy DOI

Abstract

This paper proposes a goal selection method to operate agents get maximum reward values per time by noncommunicative learning. In particular, that method aims to enable agents to cooperate along to dynamism of reward values and goal locations. Adaptation against to these dynamisms can enable agents to learn cooperative actions along to changing transportation tasks and changing incomes/rewards because of transporting tasks for heavy/valuable and light/valueless items in a storehouse. Concretely, this paper extends the previous noncommunicative cooperative action learning method (Profit minimizing reinforcement learning with oblivion of memory: PMRL-OM) and sets the two unified conditions combined of the number of time steps and the rewards. One of the unified conditions is calculated the approximated number of time steps if the expected reward values are the same each other for all purposes, and the other is the minimum number of time steps divided by the reward value. The proposed method makes all agents learn to achieve the purposes in the order in which they have the minimum number of the condition values. After that, each agent learns cooperative policy by PMRL-OM as the previous method. This paper analyzes the unified conditions and derives that the condition calculating the approximated time steps can be combined both evaluations with almost same weight unlike the value the other condition, that is, the condition can help the agents to select the appropriate purposes among them with the small difference in terms of the two evaluations. This paper tests empirically the performances of PMRL-OM with the two conditions by comparing with the PMRL-OM in three cases of grid world problems whose goal locations and reward values are changed dynamically. The results of this derive that the unified conditions perform better than PMRL-OM without some conditions in grid world problems. In particular, it is clear that the condition calculating the approximated time step can direct the appropriate goals for the agents.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Reward Value-Based Goal Selection for Agents’ Cooperative Route Learning Without Communication in Reward and Goal Dynamism

Abstract

Talk to us

Similar Papers

More From: SN Computer Science

Lead the way for us

Journal: SN Computer Science	Publication Date: May 1, 2020
License type: open-access

Similar Papers

Awareness drives changes in reward value which predict eating behavior change: Probing reinforcement learning using experience sampling from mobile mindfulness training for maladaptive eating.
Véronique A Taylor ... Alexandra Roy
Journal of Behavioral Addictions | VOL. 10
Véronique A Taylor, et. al.Véronique A Taylor ... Alexandra Roy
15 Jul 2021
Journal of Behavioral Addictions | VOL. 10

Distributed Computation of Minimum Step Consensus for Discrete Time Multi-Agent Systems
Deepak U Patil ... Ameer K Mulla
-
Deepak U Patil, et. al.Deepak U Patil ... Ameer K Mulla
01 Jan 2019
01 Jan 2019

Fire containment in grids of dimension three and higher
Mike Develin ... Stephen G Hartke
Discrete Applied Mathematics | VOL. 155
Mike Develin, et. al.Mike Develin ... Stephen G Hartke
09 Jun 2007
Discrete Applied Mathematics | VOL. 155

The orbitofrontal cortex and emotion in health and disease, including depression
Edmund T Rolls
Neuropsychologia | VOL. 128
Edmund T RollsEdmund T Rolls
24 Sep 2017
Neuropsychologia | VOL. 128

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reward Value-Based Goal Selection for Agents’ Cooperative Route Learning Without Communication in Reward and Goal Dynamism

Abstract

Talk to us

Similar Papers

More From: SN Computer Science