Abstract
The heavy ball momentum method is a commonly used technique for accelerating training processes in the machine learning community. However, empirical evidence suggests that the convergence of stochastic gradient descent (SGD) with heavy ball may slow down when the momentum hyperparameter approaches 1. Despite this observation, there are no established theories or solutions to explain and address this issue. In this study, we provide the first theoretical result that elucidates why momentum slows down SGD as it tends to 1. To better understand this inefficiency, we focus on the quadratic convex objective in the analysis. Our findings show that momentum accelerates SGD when the scaling parameter is not very close to 1. Conversely, when the scaling parameter approaches 1, momentum impairs SGD and degrades its stability. Based on the theoretical findings, we propose a descending warmup technique for the heavy ball momentum, which exploits the advantages of the heavy ball method and overcomes the inefficiency problem when the momentum tends to 1. Numerical results demonstrate the effectiveness of the proposed SHB-DW algorithm.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.