Abstract

This letter analyzes the performance of the neural network training method known as optimization layer by layer. We show, from theoretical considerations, that the amount of work required with OLL-Learning scales as the third power of the network size, compared with the square of the network size for commonly used conjugate gradient training algorithms. This theoretical estimate is confirmed through a practical example. Thus, although OLL is shown to function very well for small neural networks (less than about 500 weights per layer), it is slower than CG for large neural networks. Second, we show that OLL does not always improve on the accuracy that can be obtained with CG. It seems that the final accuracy that can be obtained depends strongly on the initial network weights.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call