Abstract
Success in cooperative tasks may be compromised if cooperative stagnation and failure from information sharing occur. However, information sharing may require excessive memory. Other methods must be developed to encourage agents to take actions in accordance with the needs of the team. In this study, a method called the cooperative tendency model using Q-learning (CTM-Q) is proposed for a partial-communication multiagent team. Each agent maintains and records its tendency values (encouraging cooperation) and Q-values (encouraging goal-seeking) as input for a payoff function that is used to select actions. Each agent selects the action with the highest payoff value for the current state. The method improves learning performance, enabling agents to rapidly reach a consensus. In simulations, the proposed method accelerated learning for multiagent cooperative applications and outperformed competing methods in solution speed, convergence time, and stability.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have