In this work, we establish the linear convergence estimate for the gradient descent involving the delay [Formula: see text] when the cost function is [Formula: see text]-strongly convex and [Formula: see text]-smooth. This result improves upon the well-known estimates in [Y. Arjevani, O. Shamir and N. Srebro, A tight convergence analysis for stochastic gradient descent with delayed updates, Proc. Mach. Learn. Res. 117 (2020) 111–132; S. U. Stich and S. P. Karimireddy, The error-feedback framework: Better rates for SGD with delayed gradients and compressed updates, J. Mach. Learn. Res. 21(1) (2020) 9613–9648] in the sense that it is non-ergodic and is still established in spite of weaker constraint of cost function. Also, the range of learning rate [Formula: see text] can be extended from [Formula: see text] to [Formula: see text] for [Formula: see text] and [Formula: see text] for [Formula: see text], where [Formula: see text] is the Lipschitz continuity constant of the gradient of cost function. In a further research, we show the linear convergence of cost function under the Polyak–Łojasiewicz[Formula: see text](PL) condition, for which the available choice of learning rate is further improved as [Formula: see text] for the large delay [Formula: see text]. The framework of the proof for this result is also extended to the stochastic gradient descent with time-varying delay under the PL condition. Finally, some numerical experiments are provided in order to confirm the reliability of the analyzed results.
Read full abstract