Abstract
Gradient-free neural network training is attracting increasing attentions, which efficiently to avoid the gradient vanishing issue in traditional neural network training with gradient-based methods. The state-of-the-art gradient-free methods introduce a quadratic penalty or use an equivalent approximation of the activation function to achieve the training process without gradients, but they are hardly to mine effective signal features since the activation function is a limited nonlinear transformation. In this paper, we first propose to construct the neural network training as a deep dictionary learning model for achieving the gradient-free training of the network. To further enhance the ability of feature extraction in network training based on gradient-free method, we introduce the logarithm function as a sparsity regularizer which introduces accurate sparse activations on the hidden layer except for the last layer. Then, we employ a proximal block coordinate descent method to forward update the variables of each layer and apply the log-thresholding operator to achieve the optimization of the non-convex and non-smooth subproblems. Finally, numerical experiments conducted on several publicly available datasets prove the sparse representation of inputs is effective for gradient-free neural network training.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.