Abstract

The problem of learning from examples in two-layer neural networks is studied within the framework of statistical mechanics. A fully connected committee machine is trained to implement a task which is not linearly separable. The generalization error as a function of the number of training examples per adjustable weight is calculated in the annealed approximation. For both binary and continuous weights we find a first-order transition with a discontinuous drop in the generalization error. The transitions occur due to a specialization of the hidden units from a symmetric state to one with each hidden unit in the student network specialized on a corresponding unit in the teacher network. The symmetric states of poor generalization remain metastable even for large training sets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call