Gradient Descent Regularization Research Articles

This study discusses the negative impact of the derivative of the activation functions in the output layer of artificial neural networks, in particular in continual learning. We propose Hebbian descent as a theoretical framework to overcome this limitation, which is implemented through an alternative loss function for gradient descent we refer to as Hebbian descent loss. This loss is effectively the generalized log-likelihood loss and corresponds to an alternative weight update rule for the output layer wherein the derivative of the activation function is disregarded. We show how this update avoids vanishing error signals during backpropagation in saturated regions of the activation functions, which is particularly helpful in training shallow neural networks and deep neural networks where saturating activation functions are only used in the output layer. In combination with centering, Hebbian descent leads to better continual learning capabilities. It provides a unifying perspective on Hebbian learning, gradient descent, and generalized linear models, for all of which we discuss the advantages and disadvantages. Given activation functions with strictly positive derivative (as often the case in practice), Hebbian descent inherits the convergence properties of regular gradient descent. While established pairings of loss and output layer activation function (e.g., mean squared error with linear or cross-entropy with sigmoid/softmax) are subsumed by Hebbian descent, we provide general insights for designing arbitrary loss activation function combinations that benefit from Hebbian descent. For shallow networks, we show that Hebbian descent outperforms Hebbian learning, has a performance similar to regular gradient descent, and has a much better performance than all other tested update rules in continual learning. In combination with centering, Hebbian descent implements a forgetting mechanism that prevents catastrophic interference notably better than the other tested update rules. When training deep neural networks, our experimental results suggest that Hebbian descent has better or similar performance as gradient descent.

Read full abstract

This paper considers the blind deconvolution of multiple modulated signals/filters, and an arbitrary filter/signal. Multiple inputs $\boldsymbol{s}_1, \boldsymbol{s}_2, \ldots, \boldsymbol{s}_N =: [\boldsymbol{s}_n]$ are modulated (pointwise multiplied) with random sign sequences $\boldsymbol{r}_1, \boldsymbol{r}_2, \ldots, \boldsymbol{r}_N =: [\boldsymbol{r}_n]$ , respectively, and the resultant inputs $(\boldsymbol{s}_n \odot \boldsymbol{r}_n) \in \mathbb {C}^Q, n \in [N]$ are convolved against an arbitrary input $\boldsymbol{h} \in \mathbb {C}^M$ to yield the measurements $\boldsymbol{y}_n = (\boldsymbol{s}_n\odot \boldsymbol{r}_n)\circledast \boldsymbol{h}, n \in [N] := 1,2,\ldots,N,$ where $\odot$ and $\circledast$ denote pointwise multiplication, and circular convolution. Given $[\boldsymbol{y}_n]$ , we want to recover the unknowns $[\boldsymbol{s}_n]$ and $\boldsymbol{h}$ . We make a structural assumption that unknowns $[\boldsymbol{s}_n]$ are members of a known $K$ -dimensional (not necessarily random) subspace, and prove that the unknowns can be recovered from sufficiently many observations using a regularized gradient descent algorithm whenever the modulated inputs $\boldsymbol{s}_n \odot \boldsymbol{r}_n$ are long enough, i.e, $Q \gtrsim KN+M$ (to within logarithmic factors, and signal dispersion/coherence parameters). Under the bilinear model, this is the first result on multichannel ( $N\geq 1$ ) blind deconvolution with provable recovery guarantees under near optimal (in the $N=1$ case) sample complexity estimates, and comparatively lenient structural assumptions on the convolved inputs. A neat conclusion of this result is that modulation of a bandlimited signal protects it against an unknown convolutive distortion. We discuss the applications of this result in passive imaging, wireless communication in unknown environment, and image deblurring. A thorough numerical investigation of the theoretical results is also presented using phase transitions, image deblurring experiments, and noise stability plots.

Read full abstract

Gradient Descent Regularization Research Articles

Related Topics

Articles published on Gradient Descent Regularization

Hebbian Descent: A Unified View on Log-Likelihood Learning.

Boundary Conditions for Linear Exit Time Gradient Trajectories Around Saddle Points: Analysis and Algorithm

Perceptron Collaborative Filtering

An Accelerated First-Order Method for Non-convex Optimization on Manifolds

Free-water DTI estimates from single b-value data might seem plausible but must be interpreted with care.

Blind Deconvolution Using Modulated Inputs

Training binary neural networks with knowledge transfer

Regularized gradient descent: a non-convex recipe for fast joint blind deconvolution and demixing

Rapid, robust, and reliable blind deconvolution via nonconvex optimization

Co-registration of intra-operative brain surface photographs and pre-operative MR images

Integrative analysis of cancer prognosis data with multiple subtypes using regularized gradient descent.

An SL(2) Invariant Shape Median

Extracting Grain Boundaries and Macroscopic Deformations from Images on Atomic Scale

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Gradient Descent Regularization Research Articles

Related Topics

Articles published on Gradient Descent Regularization

Hebbian Descent: A Unified View on Log-Likelihood Learning.

Boundary Conditions for Linear Exit Time Gradient Trajectories Around Saddle Points: Analysis and Algorithm

Perceptron Collaborative Filtering

An Accelerated First-Order Method for Non-convex Optimization on Manifolds

Free-water DTI estimates from single b-value data might seem plausible but must be interpreted with care.

Blind Deconvolution Using Modulated Inputs

Training binary neural networks with knowledge transfer

Regularized gradient descent: a non-convex recipe for fast joint blind deconvolution and demixing

Rapid, robust, and reliable blind deconvolution via nonconvex optimization

Co-registration of intra-operative brain surface photographs and pre-operative MR images

Integrative analysis of cancer prognosis data with multiple subtypes using regularized gradient descent.

An SL(2) Invariant Shape Median

Extracting Grain Boundaries and Macroscopic Deformations from Images on Atomic Scale