SYNTHESIS OF THE REWARD FUNCTION IN REINFORCED LEARNING WITH COGNITIVE GRAPHICS

S A Sakulin,A N Alfimtsev

doi:10.14489/vkit.2022.08.pp.026-036

Abstract

In recent years, reinforcement learning methods have become increasingly important for many applied areas. Such learning assumes the presence of a reward function. The reward should be the higher, the more the agent’s behavior corresponds to the desired one. At the same time, in many cases, the reward function is built axiomatically, by expert selection of one of the most widely used functions based on a superficial analysis of the subject area. This situation is due to cognitive difficulties encountered by an expert in the process of constructing reward functions, especially for a large number of arguments. At the same time, the reward function can be represented as an aggregation operator, since the range of valid values of any criterion can be reduced to a unit interval by an appropriate linear transformation. Thus, the task of constructing a reward function can be reduced to the task of constructing an aggregation operator with given properties. To ensure the intuitive clarity of the process of constructing aggregation operators, a method for their visualization using 3D-cognitive graphics has been developed. This article proposes a method for synthesizing the reward function for reinforcement learning, which includes the mentioned visualization. The synthesis method includes two procedures, which are sequences of steps performed by an expert, each of which requires him to take specific actions. An experiment was set up to test the effectiveness of the developed method. During this experiment, the synthesis of the reward function and reinforcement learning of agents based on the synthesized function were implemented in the multiagent machine learning environment of the StarCraft II computer game. Training was conducted for the standard StarCraft II reward function and for the reward function built using the proposed procedure.

Full Text