Abstract

Training Recurrent Neural Networks (RNNs) is challenging due to the vanishing/exploding gradient problem. Recent progress suggests to solve this problem by constraining the recurrent transition matrix to be unitary/orthogonal during training, but all of which are either limited-capacity, or involve time-consuming operators, e.g., evaluation for the derivation of lengthy matrix chain multiplication, the matrix exponential, or the singular value decomposition. This paper addresses this problem based on the exponentials of sparse antisymmetric matrices with one or more nonzero columns and an equal number of nonzero rows from a geometric view. An analytical expression is presented to simplify the computation of the sparse antisymmetric matrix exponential, which is actually a novel formula for parameterizing orthogonal matrices. The algorithms of this paper are fast, tunable, and full-capacity, where the target variable is updated by optimizing a matrix multiplier, instead of using the explicit gradient descent. Experiments demonstrate the superior performance of our proposed algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call