Abstract

We propose a reinforcement learning (RL) approach to compute the expression of quasi-stationary distribution. Based on the fixed-point formulation of quasi-stationary distribution, we minimize the KL-divergence of two Markovian path distributions induced by candidate distribution and true target distribution. To solve this challenging minimization problem by gradient descent, we apply a reinforcement learning technique by introducing the reward and value functions. We derive the corresponding policy gradient theorem and design an actor-critic algorithm to learn the optimal solution and the value function. The numerical examples of finite state Markov chain are tested to demonstrate the new method.

Highlights

  • Quasi-stationary distribution (QSD) is the long time statistical behavior of a stochastic process that will be surely killed when this process is conditioned to survive [1]

  • We focus on how to compute the expression of the quasi-stationary distribution, which is denoted by α( x ) on a metric space E

  • Before introducing the reinforcement learning (RL) method of our QSD problem, we develop a general formulation by introducing the KL-divergence between two path distributions

Read more

Summary

Introduction

Quasi-stationary distribution (QSD) is the long time statistical behavior of a stochastic process that will be surely killed when this process is conditioned to survive [1]. Traditional numerical algebra methods can be applied to solve the quasi-stationary distribution in finite state space, for example, the power method [16], the multi-grid method [17] and Arnoldi’s algorithm [18]. These eigenvector methods can produce a stochastic vector for QSD instead of generating samples of QSD. Of using RL for rare events sampling problems, we transform the minimization of KL divergence between P and Q into the maximization of a time-averaged reward function and defined the corresponding value function V ( x ) at each state x This completes our modeling of RL for the quasi-stationary distribution problem.

Quasi-Stationary Distribution
Review of Simulation Methods for Quasi-Stationary Distribution
Learn Quasi-Stationary Distribution
Formulation of RL and Policy Gradient Theorem
Learn QSD
Actor-Critic Algorithm
Numerical Experiment
Loopy Markov Chain
Summary and Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call