Abstract

Automatic trading policy has been researched with reinforcement learning (RL). Designing a profitable and applicable policy is of great significance for research in quantitative finance. Incoprating with deep learning, typical deep reinforcement learning (DRL) algorithms such as Proximal Policy Optimization (PPO) have shown their effectiveness. To improve the practical applicability, the manner in which we train our model should better simulate the dynamic of the stock market. A sliding window training strategy is a better solution, which employes training, validation and trading procedures on dataset with a sliding window. However, this empirical strategy can still be further investigated in algorithm evaluation and the experiment designation such as choice of sliding window. In this paper, we further investigated the continuous trading strategy (CTS). We evaluated the performance of a wider range of algorithms, including PPO, Deep Deterministic Policy Gradient (DDPG), Advantage Actor-Critic (A2C), Soft Actor-Critic (SAC). Moreover, we trained our models on a longer term period. We also provide detailed observations and suggestions on experiment settings. These discussions will facilitate researchers in their future work.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call