Abstract

Recent research works have highlight the invariance or the symmetry that exists in the weight space of a typical neural network and the negative effect of the symmetry on the training neural network due to the Euclidean gradient being not scaling-invariant. Although the problem of the symmetry can be solved by either defining a suitable Riemannian gradient, which is a scale-invariant or placing appropriate constraints on the weights, it will introduce very high-computation cost. In this paper, we first discuss various invariances or symmetries in the weight space, and then we propose to solve the problem via the scaling-invariance of the neural network itself, instead of the scaling-invariant updates methods. The motivation behind our method is that the optimized parameter point in the weight space may be moved from the ill-conditioning region to the flat region via a series of node-wise rescaling without changing the function represented by the neural network. Second, we proposed the scaling-based weight normalization. The proposed method can compatible with the commonly used optimization algorithms and collaborates well with batch normalization. Although our algorithm is very simple, it can accelerate the convergence speed. The additional computation cost introduced by our method is lower. Lastly, the experiments show that our proposed method can improve the performance of various networks architectures over large-scale datasets consistently. Our method outperforms the state-of-the-art methods on CIFAR-100: we obtain test errors as 17.18%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call