Abstract
We study a dynamic mean-variance portfolio optimization problem under the reinforcement learning framework, where an entropy regularizer is introduced to induce exploration. Due to the time-inconsistency involved in a mean-variance criterion, we aim to learn an equilibrium strategy. Under an incomplete market setting, we obtain a semi-analytical, exploratory, equilibrium mean-variance strategy that turns out to follow a Gaussian distribution. We then focus on a Gaussian mean return model and propose an algorithm to find the equilibrium strategy using the reinforcement learning technique. Thanks to a thoroughly designed policy iteration procedure in our algorithm, we can prove our algorithm's convergence under mild conditions, despite that dynamic programming principle and the usual policy improvement theorem fail to hold for an equilibrium solution. Numerical experiments are given to demonstrate our algorithm.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.