Abstract

The plateau phenomenon, wherein the loss value stops decreasing during the process of learning, is troubling. Various studies suggest that the plateau phenomenon is frequently caused by the network being trapped in the singular region on the loss surface, a region that stems from the symmetrical structure of neural networks. However, these studies all deal with networks that have a one-dimensional output, and networks with a multidimensional output are overlooked. This paper uses a statistical mechanical formalization to analyze the dynamics of learning in a two-layer perceptron with multidimensional output. We derive order parameters that capture macroscopic characteristics of connection weights and the differential equations that they follow. We show that singular-region-driven plateaus diminish or vanish with multidimensional output, in a simple setting. We found that the more non-degenerative (i.e. far from one-dimensional output) the model is, the more plateaus are alleviated. Furthermore, we showed theoretically that singular-region-driven plateaus seldom occur in the learning process in the case of orthogonalized initializations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call