Portfolio management is an important financial task, which helps to activate the capital market and boost investor confidence by finding a long-term profitable policy. Reinforcement learning is one of the most prospective method for this task. However, the policy learnt by reinforcement learning lacks robustness, because current studies cannot overcome the distribution shift between the dynamics of the current environment and the future environment. To better tackle this issue, we proposed a method that can align the distributions of different environment dynamics in a pre-trained representation space, thereby enhancing the robustness of the optimal policy in future environments. The key insight of this method is to only extract shared representation between the current and the future, which is the high-level latent information that spans across sequences. This information exists everywhere in the historical sequence, so it can be assumed that it will not disappear in the near future, thus align the distribution of different environment dynamics in this representation space. Such representations are learnt by encoding the current and the future into representations and maximizing their mutual information using probabilistic contrastive loss. Experiments demonstrates the superior performance and universality of our method.
Read full abstract