Abstract

A Multi-Agent Motion Prediction and Tracking method based on non-cooperative equilibrium (MPT-NCE) is proposed according to the fact that some multi-agent intelligent evolution methods, like the MADDPG, lack adaptability facing unfamiliar environments, and are unable to achieve multi-agent motion prediction and tracking, although they own advantages in multi-agent intelligence. Featured by a performance discrimination module using the time difference function together with a random mutation module applying predictive learning, the MPT-NCE is capable of improving the prediction and tracking ability of the agents in the intelligent game confrontation. Two groups of multi-agent prediction and tracking experiments are conducted and the results show that compared with the MADDPG method, in the aspect of prediction ability, the MPT-NCE achieves a prediction rate at more than 90%, which is 23.52% higher and increases the whole evolution efficiency by 16.89%; in the aspect of tracking ability, the MPT-NCE promotes the convergent speed by 11.76% while facilitating the target tracking by 25.85%. The proposed MPT-NCE method shows impressive environmental adaptability and prediction and tracking ability.

Highlights

  • In multi-agent intelligence, it would be common to encounter failed training, like the occurrence of non-convergence or low training speeds, if only individual agent learning methods were applied [1]

  • Most multi-agent learning approaches are based on cooperative learning strategies instead of non-cooperative equilibrium, and most of them lack the function of motion prediction and tracking

  • The reward function value of the MPT-NCE method is larger and the tracking performance is improved by 25.85% compared to the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) method. 4.4

Read more

Summary

Introduction

In multi-agent intelligence, it would be common to encounter failed training, like the occurrence of non-convergence or low training speeds, if only individual agent learning methods were applied [1]. The MAGNet method [3] is one of the improved versions by perfecting the attention mechanism; the Decomposed Multi-Agent Deep Deterministic Policy Gradient (DE-MADDPG) method [4] is another improved version by coordinating local and global rewards to speed up convergence; the MiniMax Multi-agent Deep Deterministic Policy Gradient (M3DDPG) method [5], by promoting multi-agent adaptation to the environment; and the CAA-MADDPG [6], by increasing the attention mechanism All these mentioned multi-agent intelligence learning methods inherit the cooperative learning strategies and show satisfying convergence but cannot achieve effective prediction and tracking when facing unfamiliar environments. The expected self-adaptation in unfamiliar environments had not been fulfilled yet

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call