Abstract

This work finds the analytical expression for the global minima of a deep linear network with weight decay and stochastic neurons, a fundamental model for understanding the landscape of neural networks. Our result implies that the origin is a special point in the deep neural network loss landscape where highly nonlinear phenomenon emerge. We show that weight decay strongly interacts with the model architecture and can create bad minima at zero in a network with more than one hidden layer, qualitatively different from a network with only one hidden layer. Practically, our result implies that common deep learning initialization methods are generally insufficient to ease the optimization of neural networks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call