Sharpness-Aware Lookahead for Accelerating Convergence and Improving Generalization.

Chengli Tan,Junmin Liu,Yihong Gong,Jiangshe Zhang

doi:10.1109/tpami.2024.3444002

Abstract

Lookahead is a popular stochastic optimizer that can accelerate the training process of deep neural networks. However, the solutions found by Lookahead often generalize worse than those found by its base optimizers, such as SGD and Adam. To address this issue, we propose Sharpness-Aware Lookahead (SALA), a novel optimizer that aims to identify flat minima that generalize well. SALA divides the training process into two stages. In the first stage, the direction towards flat regions is determined by leveraging a quadratic approximation of the optimization trajectory, without incurring any extra computational overhead. In the second stage, however, it is determined by Sharpness-Aware Minimization (SAM), which is particularly effective in improving generalization at the terminal phase of training. In contrast to Lookahead, SALA retains the benefits of accelerated convergence while also enjoying superior generalization performance compared to the base optimizer. Theoretical analysis of the expected excess risk, as well as empirical results on canonical neural network architectures and datasets, demonstrate the advantages of SALA over Lookahead. It is noteworthy that with approximately 25% more computational overhead than the base optimizer, SALA can achieve the same generalization performance as SAM which requires twice the training budget of the base optimizer.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sharpness-Aware Lookahead for Accelerating Convergence and Improving Generalization.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on pattern analysis and machine intelligence

Lead the way for us

Similar Papers

Machine‐learning‐based methods for output‐only structural modal identification
Dawei Liu ... Zhiyi Tang
Structural Control and Health Monitoring | VOL. 28
Dawei Liu, et. al.Dawei Liu ... Zhiyi Tang
09 Sep 2021
Structural Control and Health Monitoring | VOL. 28

Visualizations of the training process of neural networks
Karlo Babic ... Ana Mestrovic
-
Karlo Babic, et. al.Karlo Babic ... Ana Mestrovic
01 May 2019
01 May 2019

Research on the application of genetic algorithm combined with the “cleft-overstep” algorithm for improving learning process of MLP neural network with special error surface
Cong Huu Nguyen ... Thanh Nga Thi Nguyen
-
Cong Huu Nguyen, et. al.Cong Huu Nguyen ... Thanh Nga Thi Nguyen
01 Jul 2011
01 Jul 2011

Analyzing weight distribution of neural networks
Jinwook Go ... Chulhee Lee
-
Jinwook Go, et. al. Jinwook Go ... Chulhee Lee
10 Jul 1999
10 Jul 1999

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sharpness-Aware Lookahead for Accelerating Convergence and Improving Generalization.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on pattern analysis and machine intelligence