Improving algorithmic trading consistency via human alignment and imitation learning

Yuling Huang,Kai Cui,Xiaoping Lu,Chujin Zhou

doi:10.1016/j.eswa.2024.124350

Abstract

Research on algorithmic trading using reinforcement learning has become increasingly popular in recent years. Although most of the current reinforcement learning methods are employed to train the agent for some kind of modeling or data problem, it is worthwhile to explore in aligning agents with human behavior in applications as crucial as financial trading. Achieving such consistency by incorporating human expert experience into agent behavior is a key for potential improvements in this field. Imitation learning learns directly from examples of humans or other agents performing tasks. However, using imitation learning alone suffers from the problem of transitionally fitting expert example data. By combining the advantages of imitation learning and the Advantage Actor–Critic method, the Human Alignment Advantage Actor–Critic (HA3C) algorithm is proposed, to enhance single-asset trading strategy. First, by adding daily and weekly frequency trading data as input features to TimesNet, which is specifically designed to extract correlated temporal patterns from time-series data, it can capture both short-term and long-term features, thus capturing time-series features more comprehensively. Second, an expert action labeling method is proposed to train a strategy prediction network through supervised learning of behavior imitation. Third, a pre-trained strategy network is transferred to balance the exploration and exploitation of the agent’s behavior. Imitation learning techniques leverage finance-specific knowledge to enhance algorithmic trading consistency. This approach enables algorithms to mimic and adapt human decision-making patterns in finance, ultimately improving overall performance. This paper introduces a novel return-based function that efficiently balances short-term and long-term returns over flexible time horizons. It considers the maximum return from different positions and uses flexible time windows to capture trends while maximizing returns. Finally, evaluation on six commonly used datasets, such as DJI and SP500, demonstrates the advantages of the proposed HA3C algorithm compared with other classical and reinforcement learning-based strategies. Notably, on the HSI dataset, the HA3C strategy significantly outperforms other methods, achieving an impressive cumulative return of 681.55% and a Sharpe ratio of 5.07. These results show the superior performance of the HA3C algorithm in enhancing stock trading strategies and its potential to impact algorithmic trading consistency through aligning agent behavior with human expertise.

Full Text