We consider the unconstrained minimization of the function F, where F = f + g, f is an expectation-valued nonsmooth convex or strongly convex function, and g is a closed, convex, and proper function. (I) Strongly convex f. When f is μ-strongly convex in x, traditional stochastic subgradient schemes ( SSG ) often display poor behavior, arising in part from noisy subgradients and diminishing steplengths. Instead, we apply a variable sample-size accelerated proximal scheme (VS-APM) on F, the Moreau envelope of F; we term such a scheme as ( mVS-APM ) and in contrast with ( SSG ) schemes, ( mVS-APM ) utilizes constant steplengths and increasingly exact gradients. We consider two settings. (a) Bounded domains. In this setting, ( mVS-APM ) displays linear convergence in inexact gradient steps, each of which requires utilizing an inner ( prox-SSG ) scheme. Specically, ( mVS-APM ) achieves an optimal oracle complexity in prox-SSG steps of [Formula: see text] with an iteration complexity of [Formula: see text] in inexact (outer) gradients of F to achieve an ϵ-accurate solution in mean-squared error, computed via an increasing number of inner (stochastic) subgradient steps; (b) Unbounded domains. In this regime, under an assumption of state-dependent bounds on subgradients, an unaccelerated variant ( mVS-APM ) is linearly convergent where increasingly exact gradients ∇xF(x) are approximated with increasing accuracy via ( SSG ) schemes. Notably, ( mVS-APM ) also displays an optimal oracle complexity of [Formula: see text]; (II) Convex f. When f is merely convex but smoothable, by suitable choices of the smoothing, steplength, and batch-size sequences, smoothed (VS-APM) (or sVS-APM) achieves an optimal oracle complexity of [Formula: see text] to obtain an ϵ-optimal solution. Our results can be specialized to two important cases: (a) Smooth f. Since smoothing is no longer required, we observe that (VS-APM) admits the optimal rate and oracle complexity, matching prior ndings; (b) Deterministic nonsmooth f. In the nonsmooth deterministic regime, (sVS-APM) reduces to a smoothed accelerated proximal method (s-APM) that is both asymptotically convergent and optimal in that it displays a complexity of [Formula: see text], matching the bound provided by Nesterov in 2005 for producing ϵ-optimal solutions. Finally, (sVS-APM) and (VS-APM) produce sequences that converge almost surely to a solution of the original problem. Funding: The first and second authors would like to acknowledge support from NSF CMMI-1538605, DOE ARPA-E award DE-AR0001076, and the Gary and Sheila Bello Chair funds.
Read full abstract