Νέοι αλγόριθμοι εκπαίδευσης τεχνητών νευρωνικών δικτύων και εφαρμογές

Αριστοτέλης Κωστόπουλος

doi:10.12681/eadd/26873

Abstract

In this dissertation the problem of the training of feedforward artificial neural networks and its applications are considered. The presentation of the topics and the results are organized as follows: In the first chapter, the artificial neural networks are introduced. Initially, the benefits of the use of artificial neural networks are presented. In the sequence, the structure and their functionality are presented. More specifically, the derivation of the artificial neurons from the biological ones is presented followed by the presentation of the architecture of the feedforward neural networks. The historical notes and the use of neural networks in real world problems are concluding the first chapter. In Chapter 2, the existing training algorithms for the feedforward neural networks are considered. First, a summary of the training problem and its mathematical formulation, that corresponds to the uncostrained minimization of a cost function, are given. In the sequence, training algorithms based on the steepest descent, Newton, variable metric and conjugate gradient methods are presented. Furthermore, the weight space, the error surface and the techniques of the initialization of the weights are described. Their influence in the training procedure is discussed. In Chapter 3, a new training algorithm for feedforward neural networks based on the backpropagation algorithm and the automatic two-point step size (learning rate) is presented. The algorithm uses the steepest descent search direction while the learning rate parameter is calculated by minimizing the standard secant equation. Furthermore, a new learning rate parameter is derived by minimizing the modified secant equation introduced by Zhang, that uses both gradient and function value information. In the sequece a switching mechanism is incorporated into the algorithm so that the appropriate stepsize to be chosen according to the status of the current iterative point. Finaly, the global convergence of the proposed algorithm is studied and the results of some numerical experiments are presented. In Chapter 4, some efficient training algorithms, based on conjugate gradient optimization methods, are presented. In addition to the existing conjugate gradient training algorithms, we introduce Perry's conjugate gradient method as a training algorithm. Furthermore, a new class of conjugate gradient methods is proposed, called self-scaled conjugate gradient methods, which are derived from the principles of Hestenes-Stiefel, Fletcher-Reeves, Polak-Ribiere and Perry's method. This class is based on the spectral scaling parameter. Furthermore, we incorporate to the conjugate gradient training algorithms an efficient line search technique based on the Wolfe conditions and on safeguarded cubic interpolation. In addition, the initial learning rate parameter, fed to the line search technique, was automatically adapted at each iteration by a closed formula. Finally, an efficient restarting procedure was employed in order to further improve the effectiveness of the conjugate gradient training algorithms and prove their global convergence. Experimental results show that, in general, the new class of methods can perform better with a much lower computational cost and better success performance. In the last chapter of this dissertation, the Perry's self-scaled conjugate gradient training algorithm that was presented in the previous chapter was isolated and modified. More specifically, the main characteristics of the training algorithm were maintained but in this case a different line search strategy based on the nonmonotone Wolfe conditions was utilized. Furthermore, a new initial learning rate parameter was introduced for use in conjunction with the self-scaled conjugate gradient training algorithm that seems to be more effective from the initial learning rate parameter, proposed by Shanno, when used with the nonmonotone line search technique. In the sequence the experimental results for differrent training problems are presented. Finally, a feedforward neural network with the proposed algorithm for the problem of brain astrocytomas grading was trained and compared the results with those achieved by a probabilistic neural network. The dissertation is concluded with the Appendix A', where the training problems used for the evaluation of the proposed training algorithms are presented.

Full Text