Abstract
Distributional reinforcement learning (RL) differs from conventional RL, which only estimates the expectation of the return. Distributional RL considers the return as a random variable and estimates its distribution. The return distribution can provide more information than its expectation in conventional RL. Thus, distributional RL has been widely studied. However, very few previous works take full advantage of the learned distribution to improve distributional RL. This paper improves distributional RL by introducing epistemic and aleatoric uncertainty estimation. First, an epistemic and aleatoric uncertainty estimation method is introduced using deep ensembles and the learned value distribution. Next, we improve the exploration efficiency of fully parametrized quantile function (FQF) for distributional RL and obtain a FQF-U (uncertainty) algorithm. Then, to overcome the problem that distributional RL cannot operate over continuous control tasks, we propose an epistemic-uncertainty-based distributional soft actor-critic algorithm with an adaptive risk-averse and risk-seeking policy. Finally, experimental results show that our algorithms outperform the baselines in Atari games and Multi-joint dynamics with contact (MuJoCo) environments.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.