Abstract

Portfolio selection is an important application of AI in the financial field, which has attracted considerable attention from academia and industry alike. One of the great challenges in this application is modeling the correlation among assets in the portfolio. However, current studies cannot deal well with this challenge because it is difficult to analyze complex nonlinearity in the correlation. This paper proposes a policy network that models the nonlinear correlation by utilizing the self-attention mechanism to better tackle this issue. In addition, a deterministic policy gradient recurrent reinforcement learning method based on Monte Carlo sampling is constructed with the objective function of cumulative return to train the policy network. In most existing reinforcement learning-based studies, the state transition probability is generally regarded as unknown, so the value function of the policy can only be estimated. Based on financial backtest experiments, we analyze that the state transition probability is known in the portfolio, and value function can be directly obtained by sampling, further theoretically proving the optimality of the proposed reinforcement learning method in the portfolio. Finally, the superiority and generality of our approach are demonstrated through comprehensive experiments on the cryptocurrency dataset, S&P 500 stock dataset, and ETF dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call