Abstract
AbstractThis paper shows how to use results of statistical learning theory and stochastic algorithms to have a better understanding of the convergence of Reinforcement Learning (RL) once it is formulated as a fixed point problem. This can be used to propose improvement of RL learning rates. First, our analysis shows that the classical asymptotic convergence rate is pessimistic and can be replaced by with , and the number of iterations. Second, we propose a dynamic optimal policy for the choice of the learning rate used in RL. We decompose our policy into two interacting levels: the inner and outer levels. In the inner level, we present the PASS algorithm (for “PAst Sign Search”) which, based on a predefined sequence of learning rates, constructs a new sequence for which the error decreases faster. The convergence of PASS is proved and error bounds are established. In the outer level, we propose an optimal methodology for the selection of the predefined sequence. Third, we show empirically that our selection methodology of the learning rate outperforms significantly standard algorithms used in RL for the three following applications: the estimation of a drift, the optimal placement of limit orders, and the optimal execution of a large number of shares.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.