An End-to-End Optimal Trade Execution Framework based on Proximal Policy Optimization

Siyu Lin,Peter A Beling

doi:10.24963/ijcai.2020/627

Siyu Lin, Peter A Beling

Open Access

PDF Available

https://doi.org/10.24963/ijcai.2020/627

Copy DOI

Export

Save

Cite

Publication Date: Jul 1, 2020

Citations: 14

Affiliation: University of Virginia

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

In this article, we propose an end-to-end adaptive framework for optimal trade execution based on Proximal Policy Optimization (PPO). We use two methods to account for the time dependencies in the market data based on two different neural network architecture: 1) Long short-term memory (LSTM) networks, 2) Fully-connected networks (FCN) by stacking the most recent limit orderbook (LOB) information as model inputs. The proposed framework can make trade execution decisions based on level-2 limit order book (LOB) information such as bid/ask prices and volumes directly without manually designed attributes as in previous research. Furthermore, we use a sparse reward function, which gives the agent reward signals at the end of each episode as an indicator of its relative performances against the baseline model, rather than implementation shortfall (IS) or a shaped reward function. The experimental results have demonstrated advantages over IS and the shaped reward function in terms of performance and simplicity. The proposed framework has outperformed the industry commonly used baseline models such as TWAP, VWAP, and AC as well as several Deep Reinforcement Learning (DRL) models on most of the 14 US equities in our experiments.

Full Text