In this paper, we study the high probability convergence of AdaGrad-Norm for constrained, non-smooth, weakly convex optimization with bounded noise and sub-Gaussian noise cases. We also investigate a more general accelerated gradient descent (AGD) template (Ghadimi and Lan, 2016) encompassing the AdaGrad-Norm, the Nesterov's accelerated gradient descent, and the RSAG (Ghadimi and Lan, 2016) with different parameter choices. We provide a high probability convergence rate O˜(1/T) without knowing the information of the weak convexity parameter and the gradient bound to tune the step-sizes.
Read full abstract