SGORNN: Combining scalar gates and orthogonal constraints in recurrent networks

Will Taylor-Melanson,Martha Dais Ferreira,Stan Matwin

doi:10.1016/j.neunet.2022.11.028

Will Taylor-Melanson, Martha Dais Ferreira + Show 1 more

https://doi.org/10.1016/j.neunet.2022.11.028

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Recurrent Neural Network (RNN) models have been applied in different domains, producing high accuracies on time-dependent data. However, RNNs have long suffered from exploding gradients during training, mainly due to their recurrent process. In this context, we propose a variant of the scalar gated FastRNN architecture, called Scalar Gated Orthogonal Recurrent Neural Networks (SGORNN). SGORNN utilizes orthogonal matrices at the recurrent step. Our experiments evaluate SGORNN using two recently proposed orthogonal parametrizations for the recurrent weights of an RNN. We present a constraint on the scalar gates of SGORNN, which is easily enforced at training time to provide a probabilistic generalization gap which grows linearly with the length of sequences processed. Next, we provide bounds on the gradients of SGORNN to show the impossibility of exponentially exploding gradients through time. Our experimental results on the addition problem confirm that our combination of orthogonal and scalar gated RNNs are able to outperform other orthogonal RNNs and LSTM on long sequences. We further evaluate SGORNN on the HAR-2 classification task, where it improves upon the accuracy of several models using far fewer parameters than standard RNNs. Finally, we evaluate SGORNN on the Penn Treebank word-level language modeling task, where it again outperforms its related architectures and shows comparable performance to LSTM using far less parameters. Overall, SGORNN shows higher representation capacity than the other orthogonal RNNs tested, suffers from less overfitting than other models in our experiments, benefits from a decrease in parameter count, and alleviates exploding gradients during backpropagation through time.

Full Text