Abstract
Theoretical and empirical evidence highlights a positive correlation between the flatness of loss landscapes around minima and generalization. However, most current approaches that seek to find flat minima either incur high computational costs or struggle to balance generalization, training stability, and convergence. This work proposes reshaping the loss landscape to induce the optimizer toward flat regions, an approach that has negligible computational costs and does not compromise training stability, convergence, or efficiency. We focus on nonlinear, loss-dependent reshaping functions underpinned by theoretical insights to reshape the loss landscape. To design these functions, we first identify where and how these functions should be applied. With the aid of recently developed tools in stochastic optimization, theoretical analysis shows that steepening the low-loss landscape improves the rate of sharp minimum escape while flattening the high-and ultralow-loss landscapes enhances training stability and optimization performance, respectively. Simulations and experiments reveal that the subtly designed reshaping functions not only induce optimizers to find flat minima and improve generalization performance but also stabilize training, promote optimization, and keep efficiency. Our approach is evaluated on image classification, adversarial robustness, and natural language processing (NLP) tasks and achieves significant improvement in generalization performance with negligible computational cost. We believe that the new perspective introduced in this work will broadly impact the field of deep neural network training. The code is available at https://github.com/LongJin-lab/LLR.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have