Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation

  • Abstract
  • Highlights & Summary
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

<abstract><p>The training of artificial neural networks (ANNs) with rectified linear unit (ReLU) activation via gradient descent (GD) type optimization schemes is nowadays a common industrially relevant procedure. GD type optimization schemes can be regarded as temporal discretization methods for the gradient flow (GF) differential equations associated to the considered optimization problem and, in view of this, it seems to be a natural direction of research to <italic>first aim to develop a mathematical convergence theory for time-continuous GF differential equations</italic> and, thereafter, to aim to extend such a time-continuous convergence theory to implementable time-discrete GD type optimization methods. In this article we establish two basic results for GF differential equations in the training of fully-connected feedforward ANNs with one hidden layer and ReLU activation. In the first main result of this article we establish in the training of such ANNs under the assumption that the probability distribution of the input data of the considered supervised learning problem is absolutely continuous with a bounded density function that every GF differential equation admits for every initial value a solution which is also unique among a suitable class of solutions. In the second main result of this article we prove in the training of such ANNs under the assumption that the target function and the density function of the probability distribution of the input data are piecewise polynomial that every non-divergent GF trajectory converges with an appropriate rate of convergence to a critical point and that the risk of the non-divergent GF trajectory converges with rate 1 to the risk of the critical point. We establish this result by proving that the considered risk function is <italic>semialgebraic</italic> and, consequently, satisfies the <italic>Kurdyka-Łojasiewicz inequality</italic>, which allows us to show convergence of every non-divergent GF trajectory.</p></abstract>

Similar Papers
  • Research Article
  • Cite Count Icon 20
  • 10.1016/j.jmaa.2022.126601
Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation
  • Aug 17, 2022
  • Journal of Mathematical Analysis and Applications
  • Arnulf Jentzen + 1 more

Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation

  • Research Article
  • 10.30917/att-vk-1814-9588-2023-1-4
Diagnosis of dermatophytosis in cats using artificial neural networks
  • Feb 1, 2023
  • Veterinaria i kormlenie
  • А.А Bushmina + 2 more

The purpose of the research, the results of which are presented in this article, is to determine the possibility and evaluate the effectiveness of using a trained neural network in the diagnosis of ringworm. The article provides an analysis of the methods used for diagnosing dermatomycosis in veterinary practice. One of the actively developing areas at present is the use of artificial neural networks in the diagnosis of animal diseases. The authors have developed a method for diagnosing dermatophytosis using a trained neural network. To identify hair damaged by dermatophyte spores in cats, a trained artificial neural network YOLO v5 was used, based on the YOLO architecture (high-precision artificial neural network), which provides high accuracy and speed of object detection in images. Diagnostics was carried out in three stages. The first stage: the diagnosis of hair in cats damaged by dermatophyte spores was carried out using a trained artificial neural network. The second stage: microscopy by a veterinary specialist of the veterinary center. The third stage: comparison of the received data from the trained artificial neural network and veterinary specialists. Three comparative experiments were carried out on 20 depersonalized samples with different ratios from healthy and sick animals. As a result of testing the trichoscopy method using artificial neural networks for diagnosing spore-damaged hair dermatitis in cats, it was found that a trained artificial neural network of 60 studied samples diagnosed dermatophyte spore damage in 20 samples, a veterinarian - in 17. All positive results were confirmed by a mycological laboratory study. and identification of the pathogen. It has been established that the use of a trained artificial neural network increases the diagnostic efficiency by 15% and reduces the time to perform diagnostic microscopy by 60.3%. The application of the proposed method allows to reduce the time of microscopic examination, improve the accuracy of interpretation of the results, automate methods for identifying causative agents of ringworm in small animals and take timely measures to treat the animal.

  • Research Article
  • Cite Count Icon 48
  • 10.1016/j.cageo.2013.12.013
Comparing large number of metaheuristics for artificial neural networks training to predict water temperature in a natural river
  • Jan 4, 2014
  • Computers & Geosciences
  • Adam P Piotrowski + 4 more

Comparing large number of metaheuristics for artificial neural networks training to predict water temperature in a natural river

  • Conference Article
  • Cite Count Icon 69
  • 10.1109/isms.2010.31
Harmony Search Based Supervised Training of Artificial Neural Networks
  • Jan 1, 2010
  • Ali Kattan + 2 more

This paper presents a novel technique for the supervised training of feed-forward artificial neural networks (ANN) using the Harmony Search (HS) algorithm. HS is a stochastic meta-heuristic that is inspired from the improvisation process of musicians. Unlike Backpropagation, HS is non-trajectory driven. By modifying an existing improved version of HS & adopting a suitable ANN data representation, we propose a training technique where two of HS probabilistic parameters are determined dynamically based on the best-to-worst (BtW) harmony ratio in the current harmony memory instead of the improvisation count. This would be more suitable for ANN training since parameters and termination would depend on the quality of the attained solution. We have empirically tested and verified our technique by training an ANN with a benchmarking problem. In terms of overall training time and recognition, our results have revealed that our method is superior to both the original improved HS and standard Backpropagation.

  • Book Chapter
  • 10.4018/978-1-7998-2742-9.ch019
Heuristic Approach Performances for Artificial Neural Networks Training
  • Sep 24, 2020
  • Kerim Kürşat Çevik

This chapter aimed to evaluate heuristic approach performances for artificial neural networks (ANN) training. For this purpose, software that can perform ANN training application was developed using four different algorithms. First of all, training system was developed via back propagation (BP) algorithm, which is the most commonly used method for ANN training in the literature. Then, in order to compare the performance of this method with the heuristic methods, software that performs ANN training with genetic algorithm (GA), particle swarm optimization (PSO), and artificial immunity (AI) methods were designed. These designed software programs were tested on the breast cancer dataset taken from UCI (University of California, Irvine) database. When the test results were evaluated, it was seen that the most important difference between heuristic algorithms and BP algorithm occurred during the training period. When the training-test durations and performance rates were examined, the optimal algorithm for ANN training was determined as GA.

  • Conference Article
  • Cite Count Icon 5
  • 10.1109/me49197.2020.9286464
Design of Genetic Algorithms for the Simulation-Based Training of Artificial Neural Networks in the Context of Automated Vehicle Guidance
  • Dec 2, 2020
  • Or Aviv Yarom + 2 more

This paper describes the design of a Genetic Algorithm (GA) for intelligent control systems with Artificial Neural Networks (ANNs) in the context of autonomous driving in a model-based and verification-oriented process. First, a summary of the state of the art is given on the use of ANNs and GAs in control engineering. This is followed by an explanation of the design methodology used in this paper. Then the concept of a universal GA for the (simulation-based) training of any common ANNs is presented. Afterwards the design of the GA is explained in detail. Special aspects of parameterization and algorithms are also discussed. Finally, the presented method is validated by an example of a model-based design of a driving function based on an ANN for automated lateral guidance.

  • Conference Article
  • Cite Count Icon 5
  • 10.1109/isms.2011.49
A Parallel & Distributed Implementation of the Harmony Search Based Supervised Training of Artificial Neural Networks
  • Jan 1, 2011
  • Ali Kattan + 1 more

The authors have published earlier a novel technique for the supervised training of feed-forward artificial neural networks using the Harmony Search algorithm. This paper proposes a parallel and distributed implementation method to speedup the execution time to address the training of larger pattern-classification benchmarking problems. The proposed method is a hybrid technique that adopts form the merits of two common parallel and distributed training methods, namely network partitioning and pattern partitioning. Experimentation is carried out on a large pattern-classification benchmarking problem using two Master-Slave parallel systems, a homogeneous system using a cluster computer and a heterogeneous system using a set of commodity computers connected via switched network. Results show that the proposed method attains a considerable speedup in comparison to the sequential implementation.

  • Conference Article
  • Cite Count Icon 8
  • 10.1109/cicsyn.2011.65
An Enhanced Parallel & Distributed Implementation of the Harmony Search Based Supervised Training of Artificial Neural Networks
  • Jul 1, 2011
  • Ali Kattan + 1 more

The authors have published earlier a parallel & distributed implementation method for the supervised training of feed-forward artificial neural networks using the Harmony Search algorithm. Such implementation was intended to address the training of larger pattern-classification problem. The implementation platforms included both a homogeneous and a heterogeneous system of Master-Slave processing nodes. The latter heterogeneous implementation utilized a node benchmarking score obtained via independent software in order to determine the load balancing ratios for the different processing nodes. In this paper an enhanced alternative benchmarking technique is proposed that is based on the actual workload execution times for each heterogeneous processing node. Using the same pattern-classification problem on the same heterogeneous platform setup used in the previous technique, results show that the proposed technique has attained higher speedup in comparison with the former.

  • Conference Article
  • Cite Count Icon 23
  • 10.1109/icnn.1997.614192
Superior training of artificial neural networks using weight-space partitioning
  • Jun 9, 1997
  • H.V Gupta + 2 more

Linear least squares simplex (LLSSIM) is a new algorithm for batch training of three-layer feedforward artificial neural networks (ANN), based on a partitioning of the weight space. The input-hidden weights are trained using a "multi-start downhill simplex" global search algorithm, and the hidden-output weights are estimated using "conditional linear least squares". Monte-Carlo testing shows that LLSSIM provides globally superior weight estimates with significantly fewer function evaluations than the conventional backpropagation, adaptive backpropagation, and conjugate gradient strategies.

  • Research Article
  • 10.17816/dd627076
Potential of a neural network in the diagnosis of laryngeal tumors
  • Jul 3, 2024
  • Digital Diagnostics
  • Evgeniya A Safyannikova + 10 more

BACKGROUND: Currently, artificial intelligence in the form of artificial neural networks is being actively implemented in a number of areas of our lives, including medicine. In particular, in otorhinolaryngology, artificial neural networks are used to analyze images obtained during endoscopic examinations of patients (e.g., videolaryngoscopy) [1–3]. The interpretation of laryngoscopic images often presents significant difficulties for practicing physicians, which reduces the frequency of detection of precancerous laryngeal diseases and contributes to the increase in the number of patients with stage III–IV laryngeal cancer [4, 5]. This underscores the significance of prompt performance and accurate interpretation of the findings of endoscopic examinations of patients with laryngeal disorders. Artificial neural networks can be employed to analyze the results of videolaryngoscopy, furnishing the physician with supplementary information that can enhance diagnostic accuracy and diminish the probability of error [6, 7]. AIM: The study aims to develop and train an artificial neural network for recognizing characteristic features of laryngeal neoplasms and variants of laryngeal normality. MATERIALS AND METHODS: The study was conducted under the grant of the Moscow Center for Innovative Technologies in Healthcare (grant No. 2112-1/22) entitled “Using Neural Networks (Artificial Intelligence Algorithms) for Control and Improving the Quality of Diagnosis and Treatment of Diseases of Laryngeal and Ear Structures through Digital Technologies”.The following methods were used during the course of the study: data collection for the creation of a photobank (dataset) of medical images obtained during videolaryngoscopy; data partitioning for the formation of datasets for individual nosologies and groups of diseases; the method of consilium; analysis of the accuracy of recognition and classification of digital endoscopic images; and training of classification neural networks. Consequently, a dataset comprising 1,471 laryngeal images in digital formats (JPEG, BMP) was assembled, labelled, and uploaded for the purpose of training the artificial neural network. Of the total number of images, 410 were classified as pertaining to laryngeal formation, while 1061 were classified as variants of normality. Subsequently, the neural network was trained and tested to identify the signs of normal and laryngeal masses. RESULTS: The results of the testing of the artificial neural network indicated the formation of an inaccuracy matrix, the calculation of the value of recognition accuracy, the calculation of the quality indicators of the model performance, and the construction of the ROC curve. The developed and trained artificial neural network demonstrated an accuracy of 86% in recognizing the signs of laryngeal masses and norms. CONCLUSIONS: This study demonstrates that a trained artificial neural network can successfully distinguish between signs of normal and laryngeal masses in endoscopic photographs. With further training of the neural network and achievement of high accuracy, this technology can be used in clinical practice as an assistant in the interpretation of laryngoscopic images and early diagnosis of laryngeal masses. It can also be employed to control and improve the quality of diagnosis and treatment of diseases of the throat, nose, and ears by primary care physicians.

  • Research Article
  • Cite Count Icon 57
  • 10.1007/s10822-016-9895-2
Improving quantitative structure-activity relationship models using Artificial Neural Networks trained with dropout.
  • Feb 1, 2016
  • Journal of Computer-Aided Molecular Design
  • Jeffrey Mendenhall + 1 more

Dropout is an Artificial Neural Network (ANN) training technique that has been shown to improve ANN performance across canonical machine learning (ML) datasets. Quantitative Structure Activity Relationship (QSAR) datasets used to relate chemical structure to biological activity in Ligand-Based Computer-Aided Drug Discovery pose unique challenges for ML techniques, such as heavily biased dataset composition, and relatively large number of descriptors relative to the number of actives. To test the hypothesis that dropout also improves QSAR ANNs, we conduct a benchmark on nine large QSAR datasets. Use of dropout improved both enrichment false positive rate and log-scaled area under the receiver-operating characteristic curve (logAUC) by 22-46% over conventional ANN implementations. Optimal dropout rates are found to be a function of the signal-to-noise ratio of the descriptor set, and relatively independent of the dataset. Dropout ANNs with 2D and 3D autocorrelation descriptors outperform conventional ANNs as well as optimized fingerprint similarity search methods.

  • Research Article
  • 10.1080/10255810305044
Development of Self-Adaptive Artificial Neural Networks Training Algorithm
  • Jan 1, 2003
  • International Journal of Smart Engineering System Design
  • Shamsuddin Ahmed

This paper introduces an artificial neural network (ANN) training algorithm that computes a directional search vector for rapid convergence in ANN training. A higher-dimensional ANN error function is reduced in lower dimension and a mapping scheme is developed to identify self-adaptive learning rates of each neuron to improve training performance. The directional search vector points to the direction of fast training, while the dynamic self-adaptive learning rates identified by the mapping scheme generate a convergent sequence of the error function. As a result, the training is faster by a factor of 1.76 than that of the standard backpropagation training with XOR problem. The learning rates are self-adaptive and change dynamically every epoch; consequently, the oscillation during training is greatly reduced.

  • Research Article
  • Cite Count Icon 193
  • 10.1016/j.eswa.2013.10.053
Artificial Neural Network trained by Particle Swarm Optimization for non-linear channel equalization
  • Oct 31, 2013
  • Expert Systems with Applications
  • Gyanesh Das + 2 more

Artificial Neural Network trained by Particle Swarm Optimization for non-linear channel equalization

  • Research Article
  • Cite Count Icon 1
  • 10.1049/gtd2.13339
Training improvement methods of ANN trajectory predictors in power systems
  • Dec 1, 2024
  • IET Generation, Transmission & Distribution
  • Sangwon Kim

This paper proposes training improvement methods of artificial neural networks (ANN) trajectory predictors. First, a dynamic power system time‐series trajectory is split into several different segments to simplify the original ANN training problem. Moreover, the time‐derivative of the trajectory is included to obtain an augmented loss function. Compared to previous studies which mainly focused on increasing the prediction accuracy, the aim of these novel techniques is to reduce the computational burden where the ANN output performance is still acceptable. The effectiveness of the developed methods is validated based on the WSCC three‐machine nine‐bus and IEEE 39‐bus system models. The mean absolute error (MAE) and trajectory prediction results are analysed, in which the numbers of neurons, hidden layers, and training epochs are constrained during the ANN training process. Rotor‐angle difference between generators and the system frequency are investigated as the dynamic trajectories of the power system models. The approaches are revealed to be effective when the ANN architecture and epochs are constrained. The MAE results can be reduced by up to 65% in the power system models depending on the ANN hyperparameters and training epochs. The ANN training results can better reflect the original trajectory as well.

  • Book Chapter
  • Cite Count Icon 6
  • 10.1007/978-3-540-74690-4_30
The Usage of Golden Section in Calculating the Efficient Solution in Artificial Neural Networks Training by Multi-objective Optimization
  • Jan 1, 2007
  • Roselito A Teixeira + 4 more

In this work a modification was made on the algorithm of Artificial Neural Networks (NN) Training of the Multilayer Perceptron type (MLP) based on multi-objective optimization (MOBJ), to increase its computational efficiency. Usually, the number of efficient solutions to be generated is a parameter that must be provided by the user. In this work, this number is automatically determined by an algorithm, through the usage of golden section, being generally less when specified, showing a sensible reduction in the processing time and keeping the high generalization capability of the obtained solution from the original method.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant