Abstract

We consider the tasks of feature selection and policy evaluation based on linear value function approximation in reinforcement learning problems. High-dimension feature vectors and limited number of samples can easily cause over-fitting and computation expensive. To prevent this problem, ${\ell _{1}}$ -regularized method obtains sparse solutions and thus improves generalization performance. We propose an efficient ${\ell _{1}}$ -regularized recursive least squares-based online algorithm with ${O}$ ( ${n} ^{{2}}$ ) complexity per time-step, termed ${\ell _{1}}$ -RC. With the help of nested optimization decomposition, ${\ell _{1}}$ -RC solves a series of standard optimization problems and avoids minimizing mean squares projected Bellman error with ${\ell _{1}}$ -regularization directly. In ${\ell _{1}}$ -RC, we propose RC with iterative refinement to minimize the operator error, and we propose an alternating direction method of multipliers with proximal operator to minimize the fixed-point error. The convergence of ${\ell _{1}}$ -RC is established based on ordinary differential equation method and some extensions are also given. In empirical computations, some state-of-the-art ${\ell _{1}}$ -regularized methods are chosen as the baselines, and ${\ell _{1}}$ -RC are tested in both policy evaluation and learning control benchmarks. The empirical results show the effectiveness and advantages of ${\ell _{1}}$ -RC.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.