The replica trick ~RT! of statistical mechanics has become a very useful tool for the investigation of complex systems in general and, in particular, for studying learning and generalization processes in neural networks ~NN! @1,2#. The trick overcomes the difficulty of performing an ensemble average of the logarithm of the partition function Z by that of averaging Z over n replicas of the original network, with n a very large integer. For feedforward, single layered NN @3,4# the RT ansatz has been proved to be a valuable tool for the learning of a rule on the basis of a suitable set of examples. In particular, the full quenched theory has been carefully studied within the framework of a replica symmetry approximation @5#. However, as the temperature drops, symmetry breakings may invalidate this approximation, so that an approach to the full quenched theory that incorporates disorder effects due to ‘‘improper’’ examples may be of some interest. In the present effort we wish to study, in the spirit of a second-order approximation, noise effects in the training set, unavoidable in any realistic setting. The noise will here be the result of letting just part of the examples to be produced by the perceptron teacher ~PT!. The rest are to be randomly selected ~‘‘bad’’ examples!. Two types of situations are to be confronted: learnable and unlearnable rules @6#. For the former, there is at least a vector in the concomitant weight space that can learn the rule in an exact fashion. The latter arises mostly in cases of architectural mismatch. In such a situation the training error can never vanish. The question to be answered is, can a perceptron trained under these circumstances correctly respond to queries posed by a PT? In other words, is the rule underlying the ‘‘good’’ examples a learnable one? We will show here that these questions can be adequately dealt with. The paper is organized as follows. Section II is devoted to a brief recapitulation of basic concepts concerning the RT, while Sec. III deals with the thermodynamics of the situation that interests us here. A second order, high temperature approximation is derived. Boolean perceptrons with Ising weights on the learning curves are the subject of Sec. IV. Finally, some conclusions are drawn in Sec. V. II. THE REPLICA METHOD
Read full abstract