Exploiting Distributional Temporal Difference Learning to Deal with Tail Risk

Peter Bossaerts,Shijie Huang,Nitin Yadav

doi:10.3390/risks8040113

Abstract

In traditional Reinforcement Learning (RL), agents learn to optimize actions in a dynamic context based on recursive estimation of expected values. We show that this form of machine learning fails when rewards (returns) are affected by tail risk, i.e., leptokurtosis. Here, we adapt a recent extension of RL, called distributional RL (disRL), and introduce estimation efficiency, while properly adjusting for differential impact of outliers on the two terms of the RL prediction error in the updating equations. We show that the resulting “efficient distributional RL” (e-disRL) learns much faster, and is robust once it settles on a policy. Our paper also provides a brief, nontechnical overview of machine learning, focusing on RL.

Highlights

Reinforcement Learning (RL) has been successfully applied in diverse domains
We prove the superiority of e-distributional RL (disRL) over Temporal Difference (TD) Learning and disRL
To disentangle the effect of separating the two terms of the prediction error and the use of efficient estimation of the mean, we proceed in stages, and report results, first, for an estimator that only implements the separation but continues to use the sample average as the estimator of the expected rewards, and second, for an estimator that both separates the components of the TD error and applies efficient estimation when calculating the mean of the empirical distribution of rewards

Summary

Introduction

Reinforcement Learning (RL) has been successfully applied in diverse domains. the domain of finance remains a challenge. We focus on one version of TD Learning, called SARSA, whereby the agent takes the action in the subsequent trial to be the one deemed optimal given the new state, i.e., the action that provides the maximum estimated Q value given the state. New estimates of the Q values of action-state pairs are obtained by taking the expectation over this empirical distribution. This technique, referred to as Distributional RL (disRL), has been more successful than the traditional, recursive TD Learning, in contexts such as games where the state space is large and the relation states-action values is complex.

Machine Learning

Reinforcement Learning

Our Contribution

TD Learning

Leptokurtosis

Proposed Solution

Environment

Reward

Convergence

Methods

The Gaussian Environment

The Leptokurtic Enviroment I: t-Distribution

Procedure

The Leptokurtic Environment II

Impact of Outlier Risk on Categorical Distributional RL

Findings

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploiting Distributional Temporal Difference Learning to Deal with Tail Risk

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Distributional Reinforcement Learning with Quantum Neural Networks
Wei Hu ... James Hu
Intelligent Control and Automation | VOL. 10
Wei Hu, et. al.Wei Hu ... James Hu
01 Jan 2019
Intelligent Control and Automation | VOL. 10

An unsupervised autonomous learning framework for goal-directed behaviours in dynamic contexts
Chinedu Pascal Ezenkwu ... Andrew Starkey
Advances in Computational Intelligence | VOL. 2
Chinedu Pascal Ezenkwu, et. al.Chinedu Pascal Ezenkwu ... Andrew Starkey
01 Jun 2022
Advances in Computational Intelligence | VOL. 2

Proxy Experience Replay: Federated Distillation for Distributed Reinforcement Learning
Han Cha ... Seong-Lyun Kim
IEEE Intelligent Systems and their Applications | VOL. 35
Han Cha, et. al.Han Cha ... Seong-Lyun Kim
01 Jul 2020
IEEE Intelligent Systems and their Applications | VOL. 35

Data-driven offline reinforcement learning approach for quadrotor’s motion and path planning
Haoran Zhao ... Yaoming Zhou
Chinese Journal of Aeronautics | VOL. -
Haoran Zhao, et. al.Haoran Zhao ... Yaoming Zhou
01 Jul 2024
Chinese Journal of Aeronautics | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting Distributional Temporal Difference Learning to Deal with Tail Risk

Abstract

Highlights

Summary

Talk to us

Similar Papers