Abstract

A repeatable and deterministic non-random weight initialization method in convolutional layers of neural networks examined with the Fast Gradient Sign Method (FSGM). Using the FSGM approach as a technique to measure the initialization effect with controlled distortions in transferred learning, varying the dataset numerical similarity. The focus is on convolutional layers with induced earlier learning through the use of striped forms for image classification. Which provided a higher performing accuracy in the first epoch, with improvements of between 3–5% in a well known benchmark model, and also ~10% in a color image dataset (MTARSI2), using a dissimilar model architecture. The proposed method is robust to limit optimization approaches like Glorot/Xavier and He initialization. Arguably the approach is within a new category of weight initialization methods, as a number sequence substitution of random numbers, without a tether to the dataset. When examined under the FGSM approach with transferred learning, the proposed method when used with higher distortions (numerically dissimilar datasets), is less compromised against the original cross-validation dataset, at ~31% accuracy instead of ~9%. This is an indication of higher retention of the original fitting in transferred learning.

Highlights

  • Convolutional layers in neural networks have been used in Artificial Intelligence (AI) applications and led to the use of multiple layers separated by non-linearity functions

  • That earlier work [15,16] in perceptron layers did prove that an equal performance of a non-random weight initialization method was viable and that random numbers for the initialization are not necessary

  • Wang et al in 2020 [23], proposed a 2D Principle Component Analysis (2DPCA) approach to the initialization of convolutional networks to adjust the weight difference values to promote back propagation

Read more

Summary

Introduction

Convolutional layers in neural networks have been used in Artificial Intelligence (AI) applications and led to the use of multiple layers separated by non-linearity functions. The number of neurons (in the neurons axis) and show activation correlations at pixel positions (in the pixel activations axis), and that helps to generalize in a rule extraction approach, as the pixel activations have been clustered to neighboring weights Those previous papers [15,16] were confined to perceptron layers and this paper furthers that work into convolutional networks. That earlier work [15,16] in perceptron layers did prove that an equal performance of a non-random weight initialization method was viable and that random numbers for the initialization are not necessary This is the same assertion of Blumenfeld et al too in 2020 [17], in an experiment of zeroing of some of the weights in a convolutional layer.

Related Work
Contribution and Novelty
Shuffled 4 Shuffled 3 Shuffled 2 Shuffled 1 Shuffled 1 No Shuffle
The FSGM Transferred Learning Approach
Color Images and Dissimilar Model Architectures
Findings
Discussion and Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.