Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)

Almira Budiyanto,Keisuke Azetsu,Nobutomo Matsunaga

doi:10.3390/automation5040034

Abstract

A method for cooperative transportation, which required formation change in a traveling environment, is gaining interest. Deep reinforcement learning is used in formation changes for multi-robot cases. The MADDPG (Multi-Agent Deep Deterministic Policy Gradient) method is popularly used for recognized environments. On the other hand, re-learning may be required in unrecognized circumstances by using the MADDPG method. Although the development of MADDPG using model-based learning and imitation learning has been applied to reduce learning time, it is unclear how the learning results are transferred when the number of robots changes. For example, in the GASIL-MADDPG (Generative adversarial self-imitation learning and Multi-agent Deep Deterministic Policy Gradient) method, how the results of three robot training can be transferred to the four robots’ neural networks is uncertain. Nowadays, Scaled Dot Product Attention (SDPA) has attracted attention and is highly impactful for its speed and accuracy in natural language processing. When transfer learning is combined with fast computation, the efficiency of edge-level re-learning is improved. This paper proposes a formation change algorithm that allows easy and fast multi-robot knowledge transfer using SDPA combined with MAPPO (Multi-Agent Proximal Policy Optimization), compared to other methods. This algorithm applies SDPA to multi-robot formation learning and performs fast learning by transferring the acquired knowledge of formation changes to a certain number of robots. The proposed algorithm is verified by simulating the robot formation change and was able to achieve dramatic high-speed learning capabilities. The proposed SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization) learned 20.83 times faster than the Deep Dyna-Q method. Furthermore, using transfer learning from a three-robot to five-robot case, the method shows that the learning time can be reduced by about 56.57 percent. A scenario of three-robot to five-robot is chosen based on the number of robots often used in cooperative robots.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)

Abstract

Talk to us

Similar Papers

More From: Automation

Lead the way for us

Journal: Automation	Publication Date: Nov 27, 2024
License type: CC BY 4.0

Similar Papers

Multi-Agent Collaborative Target Search Based on the Multi-Agent Deep Deterministic Policy Gradient with Emotional Intrinsic Motivation
Xiaoping Zhang ... Fumiya Iida
Applied Sciences | VOL. 13
Xiaoping Zhang, et. al.Xiaoping Zhang ... Fumiya Iida
01 Nov 2023
Applied Sciences | VOL. 13

Computation Offloading and Resource Allocation Based on Multi-agent Federated Learning
Yiming Yao ... Yanqi Li
-
Yiming Yao, et. al.Yiming Yao ... Yanqi Li
01 Jan 2021
01 Jan 2021

A Confrontation Decision-Making Method with Deep Reinforcement Learning and Knowledge Transfer for Multi-Agent System
Chunyang Hu
Symmetry | VOL. 12
Chunyang HuChunyang Hu
16 Apr 2020
Symmetry | VOL. 12

Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient
Shihui Li ... Stuart Russell
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 33
Shihui Li, et. al.Shihui Li ... Stuart Russell
17 Jul 2019
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)

Abstract

Talk to us

Similar Papers

More From: Automation