Abstract
Abstract The MADDPG algorithm is widely used and relatively complete, but there is no intuitive data support for the values of some key parameters. Therefore, the influence law of key parameters on MADDPG in typical scenarios has been studied in this paper. Firstly, three typical experimental scenarios were identified, including collaborative cooperation, collaborative opposition, and collaborative pursuit with basic parameters and hyperparameters. Then, a research plan was formulated by using the control variable method to study the influence of learning rate, reward discount coefficient, and reward function coefficient on algorithm performance. Based on a large number of experimental data comparisons, the optimal values of each parameter under three experimental scenarios were obtained. The results showed that the optimal reward discount coefficient for all three scenarios was the same, indicating that its impact on scene complexity was relatively small. For the optimal learning rate, there was a general trend that the lower complexity collaborative cooperation scenario had an optimal value lower than that of the higher complexity collaborative pursuit and collaborative opposition scenarios. As for the reward coefficient, it could be concluded that when the reward coefficient was large in collaborative cooperation and collaborative opposition scenarios, the convergence and speed of the reward curve became poorer. The reward coefficient in the collaborative pursuit scenario had a less significant impact on the performance of the algorithm.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have