Abstract

The optimal stopping problem is concerned with finding an optimal policy to stop a stochastic process in order to maximize the expected return. This problem is critical in stochastic control and can be found in many different fields, such as operations research, finance, and healthcare. In this paper, we model the underlying stochastic process of the optimal stopping problem as a Markov decision process and propose a computationally efficient model-free value-based reinforcement learning approach, named ΔV-learning. The efficiency is improved by taking advantage of the unique structural properties of the optimal stopping problem into our algorithm design. We consider two types of the optimal stopping problems: the standard optimal stopping and the regenerative optimal stopping, which differ in their transition dynamics once the stopping action is executed. We conduct numerical experiments on our proposed method and compare its performance against existing reinforcement learning algorithms and rule-based policies. The results show that our ΔV-learning method is able to outperform the benchmark algorithms in all experiments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call