Abstract

In this paper, we focus on statistical arbitrage trading opportunities involving the continuous exploitation of price differences arising during an intraday trading period with the option of closing positions on the balancing market. We aim to maximise the reward–risk ratio of an autonomous trading strategy. To find an optimal trading policy, we propose utilising the asynchronous advantage actor–critic (A3C) algorithm, a deep reinforcement learning method, with function approximators of two-headed shared deep neural networks. We enforce a risk-constrained trading strategy by limiting the maximum allowed position, and conduct state engineering and selection processes. We introduce a novel reward function and goal-based exploration, i.e. behaviour cloning. Our methodology is evaluated on a case study using the limit order book of the European single intraday coupled market (SIDC) available for the Dutch market area. The majority of hourly products on the test set return a profit. We expect our study to benefit electricity traders, renewable electricity producers and researchers who seek to implement state-of-art intelligent trading strategies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call