Development of an algorithm for managing a multi-robot system for cargo transportation based on reinforcement learning in a virtual environment

L A Rybak,L Behera,A V Sapryka,M A Averbukh

doi:10.1088/1757-899x/945/1/012083

Abstract

This paper is devoted to the study of varieties of the Q-Learning algorithm – deep Q-networks and dueling Q-networks. These algorithms belong to the group of reinforcement learning algorithms. Neural network architectures are selected. The process of modeling the robot’s operation in the problem of cargo delivery from a random point A to the green zone is described. The method of obtaining information about the environment by the robot using the Raycast method is described. A block diagram for controlling the robot movement has been developed, which consists of a block of positioning and state sensors, a neural network module, and a block for constructing a trajectory. The last two blocks together form a system for automatically controlling the agent’s movement in the external environment. Modeling was performed in the Unity development environment. To work with ml agents, the special Unity ML-Agents tool is used. This tool is implemented using a modern DRL servo motor, which is based on the model of the optimization algorithm proximal policy optimization (PPO). A constructive simplification of the agent and environment to facilitate the reproduction of training scenes is implemented. An algorithm for training a robot in a random environment is presented. The optimal parameters of the algorithms under consideration are selected. Suggestions are made to improve the performance of algorithms for this problem.

Full Text