What is the Value of the Cross-Sectional Approach to Deep Reinforcement Learning?

Amine Mohamed Aboussalah,Ziyun Xu,Chi-Guhn Lee

doi:10.2139/ssrn.3748130

Abstract

Reinforcement learning (RL) for dynamic asset allocation is an emerging field of study. Total return, the common performance metric, is useful for comparing algorithms but does not help us determine how close an RL algorithm is to an optimal solution. In real world financial applications, a bad decision could prove to be fatal. We propose a methodology that allows us to assess the quality of the actions taken by the RL agent. This could allow portfolio manager practitioners to better understand the investment RL policy. We present an extensive and in-depth study of RL algorithms for use in portfolio management (PM). We studied eight published policy-based RL algorithms considered to be state of the art in game playing. We implemented all eight and we were able to modify five of them for PM but found the performance was not satisfactory. Most algorithms showed difficulty converging during the training process due to the non-stationary and noisy nature of financial environments, along with other challenges. The modification of RL algorithms to finance required unconventional changes. We have developed a novel approach for encoding multi-type and multi-frequency financial data in a way that is compatible with RL. We use a multi-channel convolutional neural network (CNN-RL) framework, where each channel corresponds to a specific type of data such as high-low-open-close prices and volumes. We also designed a reward function based on concepts such as alpha, beta, and Herfindahl-Hirschman index that is financially meaningful while still being learnable by RL. In addition, portfolio managers will typically use a blend of time series analysis and cross-sectional analysis before making a decision. We extend our approach to incorporate, for the first time, cross-sectional deep RL in addition to time series RL. Finally, we demonstrate the performance of the RL agents and benchmark them against commonly used passive and active trading strategies, such as the uniform buy-and-hold (UBAH) index and the dynamical multi-period Mean-Variance-Optimization (MVO) model.

Full Text