Enviroment Representations with Bisimulation Metrics for Hierarchical Reinforcement Learning

Chao Zhang

doi:10.1109/iccrd56364.2023.10080162

Abstract

Hierarchical reinforcement learning has achieved good results in solving complex learning tasks. Hierarchical reinforcement learning mainly includes on-policy and off-policy methods. On-policy cannot be applied to actual scenarios due to low data utilization. Therefore, off-policy reinforcement learning methods have become the main development direction. However, in the off-policy method, because the data in the replay buffer comes from different policy, the upper-layer policy gives the same goal, and the lower-layer policy is constantly updated and shifts to a different state, so the upper-layer policy cannot be stably trained. Aiming at the above problems, we propose a hierarchical reinforcement learning algorithm for environment representation based on the mutual bisimulation Metrics. When training the upper policy, the lower policy is used as a part of the environment, which is called virtual representation environment. The output is used for feature extraction, and the feature value is used as the state value of the upper-level policy net. Using our proposed method to compare a variety of complex tasks with the current main hierarchical reinforcement learning has achieved the stability and training effect improvement.

Full Text