Abstract

Classic neural networks suppose trainable parameters to include just weights of neurons. This paper proposes parabolic integrodifferential splines (ID-splines), developed by author, as a new kind of activation function (AF) for neural networks, where ID-splines coefficients are also trainable parameters. Parameters of ID-spline AF together with weights of neurons are vary during the training in order to minimize the loss function thus reducing the training time and increasing the operation speed of the neural network. The newly developed algorithm enables software implementation of the ID-spline AF as a tool for neural networks construction, training and operation. It is proposed to use the same ID-spline AF for neurons in the same layer, but different for different layers. In this case, the parameters of the ID-spline AF for a particular layer change during the training process independently of the activation functions (AFs) of other network layers. In order to comply with the continuity condition for the derivative of the parabolic ID-spline on the interval (x x0, n) , its parameters fi (i= 0,...,n) should be calculated using the tridiagonal system of linear algebraic equations: To solve the system it is necessary to use two more equations arising from the boundary conditions for specific problems. For exam- ple the values of the grid function (if they are known) in the points (x x0, n) may be used for solving the system above: f f x0 = ( 0) , f f xn = ( n) . The parameters Iii+1 (i= 0,...,n−1 ) are used as trainable parameters of neural networks. The grid boundaries and spacing of the nodes of ID-spline AF are best chosen experimentally. The optimal selection of grid nodes allows improving the quality of results produced by the neural network. The formula for a parabolic ID-spline is such that the complexity of the calculations does not depend on whether the grid of nodes is uniform or non-uniform. An experimental comparison of the results of image classification from the popular FashionMNIST dataset by convolutional neural 0, x< 0 networks with the ID-spline AFs and the well-known ReLUx( ) =AF was carried out. The results reveal that the usage x x, ≥ 0 of the ID-spline AFs provides better accuracy of neural network operation than the ReLU AF. The training time for two convolutional layers network with two ID-spline AFs is just about 2 times longer than with two instances of ReLU AF. Doubling of the training time due to complexity of the ID-spline formula is the acceptable price for significantly better accuracy of the network. Wherein the difference of an operation speed of the networks with ID-spline and ReLU AFs will be negligible. The use of trainable ID-spline AFs makes it possible to simplify the architecture of neural networks without losing their efficiency. The modification of the well-known neural networks (ResNet etc.) by replacing traditional AFs with ID-spline AFs is a promising approach to increase the neural network operation accuracy. In a majority of cases, such a substitution does not require to train the network from scratch because it allows to use pre-trained on large datasets neuron weights supplied by standard software libraries for neural network construction thus substantially shortening training time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call