Abstract
A repeatable and deterministic non-random weight initialization method in convolutional layers of neural networks examined with the Fast Gradient Sign Method (FSGM). Using the FSGM approach as a technique to measure the initialization effect with controlled distortions in transferred learning, varying the dataset numerical similarity. The focus is on convolutional layers with induced earlier learning through the use of striped forms for image classification. Which provided a higher performing accuracy in the first epoch, with improvements of between 3–5% in a well known benchmark model, and also ~10% in a color image dataset (MTARSI2), using a dissimilar model architecture. The proposed method is robust to limit optimization approaches like Glorot/Xavier and He initialization. Arguably the approach is within a new category of weight initialization methods, as a number sequence substitution of random numbers, without a tether to the dataset. When examined under the FGSM approach with transferred learning, the proposed method when used with higher distortions (numerically dissimilar datasets), is less compromised against the original cross-validation dataset, at ~31% accuracy instead of ~9%. This is an indication of higher retention of the original fitting in transferred learning.
Highlights
Convolutional layers in neural networks have been used in Artificial Intelligence (AI) applications and led to the use of multiple layers separated by non-linearity functions
That earlier work [15,16] in perceptron layers did prove that an equal performance of a non-random weight initialization method was viable and that random numbers for the initialization are not necessary
Wang et al in 2020 [23], proposed a 2D Principle Component Analysis (2DPCA) approach to the initialization of convolutional networks to adjust the weight difference values to promote back propagation
Summary
Convolutional layers in neural networks have been used in Artificial Intelligence (AI) applications and led to the use of multiple layers separated by non-linearity functions. The number of neurons (in the neurons axis) and show activation correlations at pixel positions (in the pixel activations axis), and that helps to generalize in a rule extraction approach, as the pixel activations have been clustered to neighboring weights Those previous papers [15,16] were confined to perceptron layers and this paper furthers that work into convolutional networks. That earlier work [15,16] in perceptron layers did prove that an equal performance of a non-random weight initialization method was viable and that random numbers for the initialization are not necessary This is the same assertion of Blumenfeld et al too in 2020 [17], in an experiment of zeroing of some of the weights in a convolutional layer.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.