Optimal Oracle Inequalities for Projected Fixed-Point Equations, with Applications to Policy Evaluation

Wenlong Mou,Ashwin Pananjady,Martin J Wainwright

doi:10.1287/moor.2022.1341

Abstract

Linear fixed-point equations in Hilbert spaces arise in a variety of settings, including reinforcement learning, and computational methods for solving differential and integral equations. We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space. First, we prove an instance-dependent upper bound on the mean-squared error for a linear stochastic approximation scheme that exploits Polyak–Ruppert averaging. This bound consists of two terms: an approximation error term with an instance-dependent approximation factor and a statistical error term that captures the instance-specific complexity of the noise when projected onto the low-dimensional subspace. Using information-theoretic methods, we also establish lower bounds showing that both of these terms cannot be improved, again in an instance-dependent sense. A concrete consequence of our characterization is that the optimal approximation factor in this problem can be much larger than a universal constant. We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation, establishing their optimality. Funding: This work was partially supported by grants from the Office of Naval Research [Grant DOD-ONRN00014-18-1-2640] and the National Science Foundation (NSF) [NSF-IIS Grant 1909365, NSF-DMS Grant 2015454, and NSF-CCF Grant 1955450] to M. J. Wainwright. Part of this work was performed when A. Pananjady was visiting the Simons Institute for the Theory of Computing, where he was supported by a Swiss Re Research Fellowship. Supplemental Material: The online supplementary file is available at https://doi.org/10.1287/moor.2022.1341 .

Full Text