In deep learning, finding flat minima of loss function is a hot research topic in improving generalization. The existing methods usually find flat minima by sharpness minimization algorithms. However, these methods suffer from insufficient flexibility for optimization and generalization due to their ignorance of loss value. This article theoretically and experimentally explores the sharpness minimization algorithms for neural networks. First, a novel scale-invariant sharpness which is called scale-adaptive central moment sharpness (SA-CMS) is proposed. This sharpness is not only scale-invariant but can characterize the nature of loss surface clearly. Based on the proposed sharpness, this article further derives a new regularization term by integrating the different orders of the sharpness. Particularly, a host of sharpness minimization functions such as local entropy can be covered by this regularization term. Then the central moment sharpness generating function is introduced as a new objective function. Moreover, theoretical analyses indicate that the new objective function has a smoother landscape and prefer converging to flat local minima. Furthermore, a computationally efficient two-stage algorithm is developed to minimize the objective function. Compared with other algorithms, the two-stage loss-sharpness minimization (TSLSM) algorithm offers a more flexible optimization target for different training stages. On a variety of learning tasks with both small and large batch sizes, this algorithm is more universal and effective, and meanwhile achieves or surpasses the generalization performance of the state-of-the-art sharpness minimization algorithms.
Read full abstract