Abstract We consider a class of statistical estimation problems in which we are given a random data matrix ${\boldsymbol{X}}\in{\mathbb{R}}^{n\times d}$ (and possibly some labels ${\boldsymbol{y}}\in{\mathbb{R}}^{n}$) and would like to estimate a coefficient vector ${\boldsymbol{\theta }}\in{\mathbb{R}}^{d}$ (or possibly a constant number of such vectors). Special cases include low-rank matrix estimation and regularized estimation in generalized linear models (e.g. sparse regression). Firstorder methods proceed by iteratively multiplying current estimates by ${\boldsymbol{X}}$ or its transpose. Examples include gradient descent or its accelerated variants. Under the assumption that the data matrix ${\boldsymbol{X}}$ is standard Gaussian, Celentano, Montanari, Wu (2020, Conference on Learning Theory, pp. 1078–1141. PMLR) proved that for any constant number of iterations (matrix vector multiplications), the optimal firstorder algorithm is a specific approximate message passing algorithm (known as ‘Bayes AMP’). The error of this estimator can be characterized in the high-dimensional asymptotics $n,d\to \infty $, $n/d\to \delta $, and provides a lower bound to the estimation error of any firstorder algorithm. Here we present a simpler proof of the same result, and generalize it to broader classes of data distributions and of firstorder algorithms, including algorithms with non-separable nonlinearities. Most importantly, the new proof is simpler, does not require to construct an equivalent tree-structured estimation problem, and is therefore susceptible of a broader range of applications.
Read full abstract