We study problem-dependent rates, that is, generalization errors that scale near-optimally with the variance, effective loss, or gradient norms evaluated at the “best hypothesis.” We introduce a principled framework dubbed “uniform localized convergence” and characterize sharp problem-dependent rates for central statistical learning problems. From a methodological viewpoint, our framework resolves several fundamental limitations of existing uniform convergence and localization analysis approaches. It also provides improvements and some level of unification in the study of localized complexities, one-sided uniform inequalities, and sample-based iterative algorithms. In the so-called “slow rate” regime, we provide the first (moment-penalized) estimator that achieves the optimal variance-dependent rate for general “rich” classes; we also establish an improved loss-dependent rate for standard empirical risk minimization. In the “fast rate” regime, we establish finite-sample, problem-dependent bounds that are comparable to precise asymptotics. In addition, we show that iterative algorithms such as gradient descent and first order expectation maximization can achieve optimal generalization error in several representative problems across the areas of nonconvex learning, stochastic optimization, and learning with missing data. Supplemental Material: The online appendix is available at https://doi.org/10.1287/moor.2021.0076 .
Read full abstract