Order execution is an extremely important problem in the financial domain, and recently, more and more researchers have tried to employ reinforcement learning (RL) techniques to solve this challenging problem. There are a lot of difficulties for conventional RL methods to tackle the order execution problem, such as the large action space including price and quantity, and the long-horizon property. As naturally order execution is composed of a low-frequency volume scheduling stage and a high-frequency order placement stage, most existing RL-based order execution methods treat these stages as two distinct tasks and offer a partial solution by addressing either one individually. However, the current literature fails to model the non-negligible mutual influence between these two tasks, leading to impractical order execution solutions. To address these limitations, we propose a novel automatic order execution approach based on the hierarchical RL framework (OEHRL), which jointly learns the policies for volume scheduling and order placement. OEHRL first extracts the state embeddings at both the macro and micro levels with a sequential variational auto-encoder model. Based on the effective embeddings, OEHRL generates a hindsight expert dataset, which is used to train a hierarchical order execution policy. In the hierarchical structure, the high-level policy is in charge of the target volume and the low-level learns to determine the prices for a series of the allocated sub-orders from the high level. These two levels collaborate seamlessly and contribute to the optimal order execution policy. Extensive experiment results on 200 stocks across the US and China A-share markets validate the effectiveness of the proposed approach.
Read full abstract