Abstract

In recent years, reinforcement learning methods have become increasingly important for many applied areas. Such learning assumes the presence of a reward function. The reward should be the higher, the more the agent’s behavior corresponds to the desired one. At the same time, in many cases, the reward function is built axiomatically, by expert selection of one of the most widely used functions based on a superficial analysis of the subject area. This situation is due to cognitive difficulties encountered by an expert in the process of constructing reward functions, especially for a large number of arguments. At the same time, the reward function can be represented as an aggregation operator, since the range of valid values of any criterion can be reduced to a unit interval by an appropriate linear transformation. Thus, the task of constructing a reward function can be reduced to the task of constructing an aggregation operator with given properties. To ensure the intuitive clarity of the process of constructing aggregation operators, a method for their visualization using 3D-cognitive graphics has been developed. This article proposes a method for synthesizing the reward function for reinforcement learning, which includes the mentioned visualization. The synthesis method includes two procedures, which are sequences of steps performed by an expert, each of which requires him to take specific actions. An experiment was set up to test the effectiveness of the developed method. During this experiment, the synthesis of the reward function and reinforcement learning of agents based on the synthesized function were implemented in the multiagent machine learning environment of the StarCraft II computer game. Training was conducted for the standard StarCraft II reward function and for the reward function built using the proposed procedure.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.