Deep Execution - Value and Policy Based Reinforcement Learning for Trading and Beating Market Benchmarks

Kevin Dab&Eacute;Rius,Patrik Karlsson,Elvin Granat

doi:10.2139/ssrn.3374766

Abstract

In this article we introduce the term Deep Execution that utilize deep reinforcement learning (DRL) for optimal execution. We demonstrate two different approaches to solve for the optimal execution: (1) the deep double Q-network (DDQN), a value-based approach and (2) the proximal policy optimization (PPO) a policy-based approach, for trading and beating market benchmarks, such as the time-weighted average price (TWAP). We show that, firstly, the DRL can reach the theoretically derived optimum by acting on the environment directly. Secondly, the DRL agents can learn to capitalize on price trends (alpha signals) without directly observing the price. Finally, the DRL can take advantage of the available information to create dynamic strategies as an informed trader and thus outperform static benchmark strategies such as the TWAP.

Full Text