Abstract

ANNs succeed in several tasks for real scenarios due to their high learning abilities. This paper focuses on theoretical aspects of ANNs to enhance the capacity of implementing those modifications that make ANNs absorb the defining features of each scenario. This work may be also encompassed within the trend devoted to providing mathematical explanations of ANN performance, with special attention to activation functions. The base algorithm has been mathematically decoded to analyse the required features of activation functions regarding their impact on the training process and on the applicability of the Universal Approximation Theorem. Particularly, significant new results to identify those activation functions which undergo some usual failings (gradient preserving) are presented here. This is the first paper—to the best of the author’s knowledge—that stresses the role of injectivity for activation functions, which has received scant attention in literature but has great incidence on the ANN performance. In this line, a characterization of injective activation functions has been provided related to monotonic functions which satisfy the classical contractive condition as a particular case of Lipschitz functions. A summary table on these is also provided, targeted at documenting how to select the best activation function for each situation.

Highlights

  • Forecasting is one of the greatest successes of human beings

  • While a great deal of research aimed at ensuring the stability of learning processes proposes to bound the variables, some authors [11] claim the significance of using bounded activation functions in order to avoid instability

  • As for injectivity regarding the Universal Approximation Theorem (UAT), we refer to work [21], where a particular case of continuous activation functions φ are introduced, those which satisfy any of the following equivalent conditions: φ is injective and has no fixed points ⇔ either φ(z) > z or φ(z) < z holds for every z ∈ Dom( φ)

Read more

Summary

Introduction

Forecasting is one of the greatest successes of human beings. This is the engine that provides solid support in decision making (DM) by simulating a future range of possibilities in order to anticipate potential problems and/or by designing tools that increase reliability of predictions. This paper may be firstly encompassed within the trend devoted to providing mathematical explanations to ANN performance An example of this trend is work regarding the Universal Approximation Theorem (UAT), which shows that any continuous function on a compact set can be approximated by a fully connected neural network with one hidden layer by using a nonpolynomial activation function. A further study of advantages and disadvantages of activation functions is performed These are decisive pieces in the success or failure of the ANNs, as we shall see when we explore their determinant features regarding the applicability of the Universal Approximation Theorem. Another reason to carry out this analysis is the enormous specific weight that the choice of the activation function has on the training process.

Mathematical Foundations
Theoretical Learning Algorithm
GDM: Gradient Descent Minimum or Cauchy Descendent
Training the ANN
The Role of the Activation Function
Derived from the Theoretical Foundations of the Training Process
Influence of Activation Functions on the Training Process
Mainly Used Activation Functions
Practical Learning Algorithm
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call