Abstract
This paper uses reinforcement learning (RL) to approximate the policy rules of banks participating in a high-value payment system (HVPS). The objective of the RL agents is to learn a policy function for the choice of amount of liquidity provided to the system at the beginning of the day and the rate at which to pay intraday payments. Individual choices have complex strategic effects precluding a closed form solution of the optimal policy, except in simple cases. We show that in a stylized two-agent setting, RL agents learn the optimal policy that minimizes the cost of processing their individual payments—without complete knowledge of the environment. We further demonstrate that in more complex settings, both agents learn to reduce the cost of processing their payments and effectively respond to liquidity-delay trade-off. Our results show the potential of RL to solve liquidity management problems in HVPS and provide new tools to assist policymakers in their mandates of ensuring safety and improving the efficiency of payment systems.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have