Abstract

Reinforcement learning has shown remarkable success in various applications, and in some cases, even outperforms human performance. However, despite the potential of reinforcement learning, numerous challenges still exist. In this paper, we introduce a novel approach that exploits the synergies between hierarchical reinforcement learning and distributional reinforcement learning to address complex sparse-reward tasks, where noisy state observations or non-stationary exogenous perturbations are present. Our proposed method has a hierarchical policy structure, where random rewards are modeled as random variables that follow a value distribution. This approach enables the handling of complex tasks and increases robustness to uncertainties arising from measurement noise or exogenous perturbations, such as wind. To achieve this, we extend the distributional soft Bellman operator and temporal difference error to include the hierarchical structure, and we use quantile regression to approximate the reward distribution. We evaluate our method using a bipedal robot in the OpenAI Gym environment and an electric autonomous vehicle in the SUMO traffic simulator. The results demonstrate the effectiveness of our approach in solving complex tasks with the aforementioned uncertainties when compared to state-of-the-art methods. Our approach demonstrates promising results in handling uncertainties caused by noise and perturbations for challenging sparse-reward tasks, and could potentially pave the way for the development of more robust and effective reinforcement learning algorithms in real physical systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call