Abstract

This paper is devoted to the study of varieties of the Q-Learning algorithm – deep Q-networks and dueling Q-networks. These algorithms belong to the group of reinforcement learning algorithms. Neural network architectures are selected. The process of modeling the robot’s operation in the problem of cargo delivery from a random point A to the green zone is described. The method of obtaining information about the environment by the robot using the Raycast method is described. A block diagram for controlling the robot movement has been developed, which consists of a block of positioning and state sensors, a neural network module, and a block for constructing a trajectory. The last two blocks together form a system for automatically controlling the agent’s movement in the external environment. Modeling was performed in the Unity development environment. To work with ml agents, the special Unity ML-Agents tool is used. This tool is implemented using a modern DRL servo motor, which is based on the model of the optimization algorithm proximal policy optimization (PPO). A constructive simplification of the agent and environment to facilitate the reproduction of training scenes is implemented. An algorithm for training a robot in a random environment is presented. The optimal parameters of the algorithms under consideration are selected. Suggestions are made to improve the performance of algorithms for this problem.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call