Bellman Meets Hawkes: Model-Based Reinforcement Learning via Temporal Point Processes

Chao Qu,Xiaoming Shi,Xiaoyu Tan,James Zhang,Siqiao Xue,Hongyuan Mei

doi:10.1609/aaai.v37i8.26142

Abstract

We consider a sequential decision making problem where the agent faces the environment characterized by the stochastic discrete events and seeks an optimal intervention policy such that its long-term reward is maximized. This problem exists ubiquitously in social media, finance and health informatics but is rarely investigated by the conventional research in reinforcement learning. To this end, we present a novel framework of the model-based reinforcement learning where the agent's actions and observations are asynchronous stochastic discrete events occurring in continuous-time. We model the dynamics of the environment by Hawkes process with external intervention control term and develop an algorithm to embed such process in the Bellman equation which guides the direction of the value gradient. We demonstrate the superiority of our method in both synthetic simulator and real-data experiments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Bellman Meets Hawkes: Model-Based Reinforcement Learning via Temporal Point Processes

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jun 26, 2023
Citations: 4

Similar Papers

Learning Context-Sensitive Strategies in Space Fortress
Akshat Agarwal ... Ryan Hope
-
Akshat Agarwal, et. al.Akshat Agarwal ... Ryan Hope
01 Oct 2019
01 Oct 2019

Entropy regularized reinforcement learning using large deviation theory
Argenis Arriojas ... Jacob Adamczyk
Physical Review Research | VOL. 5
Argenis Arriojas, et. al.Argenis Arriojas ... Jacob Adamczyk
10 May 2023
Physical Review Research | VOL. 5

Optimal Rate Control for Latency-constrained High Throughput Big Data Applications
Ziren Xiao ... Aaron Harwood
-
Ziren Xiao, et. al.Ziren Xiao ... Aaron Harwood
17 Dec 2022
17 Dec 2022

Research on Portfolio Optimization Models Using Deep Deterministic Policy Gradient
Li Wei ... Zhang Weiwei
-
Li Wei, et. al.Li Wei ... Zhang Weiwei
01 Nov 2020
01 Nov 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bellman Meets Hawkes: Model-Based Reinforcement Learning via Temporal Point Processes

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence